Received: by 10.192.165.156 with SMTP id m28csp2428565imm; Thu, 12 Apr 2018 14:12:25 -0700 (PDT) X-Google-Smtp-Source: AIpwx49gkT7KO9oQ1vQtMKtXLdmPd4dDy5gUh12lLff+jr0xzXSEUKhQR9y7vvi2yBGmJgqQP9t9 X-Received: by 10.99.117.84 with SMTP id f20mr1898845pgn.188.1523567545207; Thu, 12 Apr 2018 14:12:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523567545; cv=none; d=google.com; s=arc-20160816; b=CROkiZaZPo4JtRxCRWbHUbNyRqRfz6GwpYjyJY0d/pSbFSBVEs/YYri+J/5Fuk2sBa BghvuSOpdfVsgHGDs/BmXOxYMqY4xvx10Z7GknojMZSeb/WoQS3nqoqk6Rl+vi+5E+FC 2L5i0UfJcShje9dkJ8Cz32J1eaTJRcS8IG8xQwLGx7XqjtiSp4mP2CpHL8AUxRq7BmHz Q3ZbHLrYf2QI9BqZGK3SBCcnutTnRkLeNvjlACJ9kUnRf45bcKYuEhQ66E74aeRcGgvv Oo8/512oDB8qvlvw75n6ThUoFEiQ2p1iKYK3rnich5brMuN5wiy/r2alTULlama16PI9 A4dQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:references:cc:to:subject :from:dkim-signature:arc-authentication-results; bh=LweI//4YslwdkdFm9NzZVetqwQ4zWUX9MRy+5OUJ/Hk=; b=XWlzNWwLkBYCRSQmO0J80h37x8VEPrmSHfn1g0IGt2p3Tukqi5VBPr7+QQ+IAKYyTf BwFaMe3i22EFXirGVK7ufXYzCk5tX/wKhxpwhBc09XBCq0lhaS4Kp24+gvrkkpIYHRuC q6UfzjNFTAUP4A3GZn+1PGIgWExrbvNseWY+ZOvOtsyeQY+QDvU7qZjh4s7KCUN56iDk jVlLoI3d8TjeDrfWRUdJk+5Jvp7PTcKOfONUFxx1nU97yc14Rjvv89z6sZwn+zD7RW3d Z0iG0lPAStU8Egsq+V5LEuLfLhK6tj/d4IL4jZb+4mcn6PGS3qUiql87LTTPlyBbsuuV iWWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=banBUBB/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u9si2808578pge.641.2018.04.12.14.12.09; Thu, 12 Apr 2018 14:12:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=banBUBB/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753153AbeDLTnP (ORCPT + 99 others); Thu, 12 Apr 2018 15:43:15 -0400 Received: from mail-wm0-f50.google.com ([74.125.82.50]:55665 "EHLO mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752989AbeDLTnN (ORCPT ); Thu, 12 Apr 2018 15:43:13 -0400 Received: by mail-wm0-f50.google.com with SMTP id b127so502058wmf.5; Thu, 12 Apr 2018 12:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=LweI//4YslwdkdFm9NzZVetqwQ4zWUX9MRy+5OUJ/Hk=; b=banBUBB/lmzXAdvW7zWP6YpUUgpwwKYmrvM/hneYeCLzppLLLk4PSjz/Gp+3Kb5/ZU FRVjMCqdrk9eaadqtcfRSBDO6v+2vlTS83UERlZSW8K2PJXnLfaZSmraMIAqOIQu3ivT xKmhvsyWu4WT49uRx67UKzxK5I/lbpEhCJ0aVcJCASj7szOE7hL7ZDnjSnn3Hq486Og7 hrZpLx5k2qMv+8QVhEnC3MvmYJF0An0EHG68Je060YkCvXE4cRk/ENy9O20G63wr4bSW 16sxrYMVLK3PhlBwdwl/5AifXiz2PGGhVnA8P0c2yIiMy4n45+tvfMQnwS6N/+WJnQp6 FVCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=LweI//4YslwdkdFm9NzZVetqwQ4zWUX9MRy+5OUJ/Hk=; b=hWD3nOwLzGOZVaMqJdM8mDNzABxhMQzI0y7220+S5JJFfqioSKK3pg1GH9I5FaNCL5 Lwr+CGmsEIn2hVXGr36Cuv4Spj9XnNhiF7xDmrHV1S+0pyASXm1L1oE+MxWFd7YjBMJS o5MABF8LtuZgwg0fn/TXMyrHX90Kz/mMn7fSFRex2SEiku0465TQ+aSLuvEA7h/luXnz i1qk//eayQBPgGIkyQhIHB7Iugc4IV3vXvjG946KYmxN6orCAHxm2qMjCsxRUCt9/JwK BxWGw0pEZeN5mQ1csRhk8RgQdS7YroE7RfSESNKfEIoTohKLSeSCvZe+IcJvBJVyl4N7 QWog== X-Gm-Message-State: ALQs6tDgsvkxdDqDqIKXVe5gilQubRXFUI+Ifz3cW0/bCdN1Pbn4xxX+ N3/dDx0JoWX1N3diw+YhzGYupA== X-Received: by 10.28.24.149 with SMTP id 143mr1716821wmy.127.1523562191534; Thu, 12 Apr 2018 12:43:11 -0700 (PDT) Received: from ?IPv6:2003:ea:8bd4:3d00:b8e9:f7e1:ea5f:c18a? (p200300EA8BD43D00B8E9F7E1EA5FC18A.dip0.t-ipconnect.de. [2003:ea:8bd4:3d00:b8e9:f7e1:ea5f:c18a]) by smtp.googlemail.com with ESMTPSA id u65sm4391016wrc.72.2018.04.12.12.43.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Apr 2018 12:43:10 -0700 (PDT) From: Heiner Kallweit Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") To: Vincent Guittot , =?UTF-8?Q?Niklas_S=c3=b6derlund?= , i@linaro.org Cc: Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , linux-kernel , linux-renesas-soc@vger.kernel.org References: <20180412091822.GG12256@bigcity.dyn.berto.se> <20180412111519.GH12256@bigcity.dyn.berto.se> <20180412133031.GA551@linaro.org> Message-ID: Date: Thu, 12 Apr 2018 21:43:05 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180412133031.GA551@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 12.04.2018 um 15:30 schrieb Vincent Guittot: > Heiner, Niklas, > > Le Thursday 12 Apr 2018 à 13:15:19 (+0200), Niklas Söderlund a écrit : >> Hi Vincent, >> >> Thanks for your feedback. >> >> On 2018-04-12 12:33:27 +0200, Vincent Guittot wrote: >>> Hi Niklas, >>> >>> On 12 April 2018 at 11:18, Niklas Söderlund >>> wrote: >>>> Hi Vincent, >>>> >>>> I have observed issues running on linus/master from a few days back [1]. >>>> I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue >>>> by X forwarding the v4l2 test application qv4l2 over ssh and moving the >>>> courser around in the GUI (best test case description award...). I'm >>>> sorry about the really bad way I trigger this but I can't do it in any >>>> other way, I'm happy to try other methods if you got some ideas. The >>>> symptom of the issue is a complete hang of the system for more then 30 >>>> seconds and then this information is printed in the console: >>> >>> Heiner (edded cc) also reported similar problem with his platform: a >>> dual core celeron >>> >>> Do you confirm that your platform is a dual cortex-A15 ? At least that >>> what I have seen on web >>> This would confirm that dual system is a key point. >> >> I can confirm that my platform is a dual core. >> >>> >>> The ssh connection is also common with Heiner's setup >> >> Interesting, I found Heiner's mail and I can confirm that I too >> experience ssh sessions lockups. I ssh into the system and by repeatedly >> hitting the return key I can lockup the board, while locked up starting >> another ssh session unblocks the first. If I don't start another ssh >> session but keep hitting return key sporadically in the first one I can >> get the trace I reported in my first mail to be printed on the serial >> console. >> >> When locked up the symptoms are that both the single ssh session is dead >> and the serial console. But attempting another ssh connection >> immediately unblocks both ssh and serial console. And if I allow enough >> time before starting the second ssh connection I can trigger a trace to >> be printed on the serial console, it's similar but different from the >> first I reported. >> >> [ 207.548610] 1-...!: (0 ticks this GP) idle=79a/1/1073741824 softirq=2146/2146 fqs=0 >> [ 207.556442] (detected by 0, t=12645 jiffies, g=333, c=332, q=20) >> [ 207.562546] rcu_sched kthread starved for 12645 jiffies! g333 c332 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0 >> [ 207.572548] RCU grace-period kthread stack dump: >> >> [ 207.577166] rcu_sched R running task 0 9 2 0x00000000 >> [ 207.584389] Backtrace: >> [ 207.586849] [] (__schedule) from [] (schedule+0x94/0xb8) >> [ 207.593901] r10:e77813c0 r9:e77813c0 r8:ffffffff r7:e709bed4 r6:ffffaa80 r5:00000000 >> [ 207.601732] r4:ffffe000 >> [ 207.604269] [] (schedule) from [] (schedule_timeout+0x380/0x3dc) >> [ 207.612013] r5:00000000 r4:00000000 >> [ 207.615596] [] (schedule_timeout) from [] (rcu_gp_kthread+0x668/0xe2c) >> [ 207.623863] r10:c0b79018 r9:0000014d r8:0000014c r7:00000001 r6:00000000 r5:c0b10ad0 >> [ 207.631693] r4:c0b10980 >> [ 207.634230] [] (rcu_gp_kthread) from [] (kthread+0x148/0x160) >> [ 207.641712] r7:c0b10980 >> [ 207.644249] [] (kthread) from [] (ret_from_fork+0x14/0x2c) >> [ 207.651472] Exception stack(0xe709bfb0 to 0xe709bff8) >> [ 207.656527] bfa0: 00000000 00000000 00000000 00000000 >> [ 207.664709] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> [ 207.672890] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 >> [ 207.679508] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c013dc90 >> [ 207.687340] r4:e7026f4 >> >> Continuing the anecdotal testing, I can't seem to be able to trigger the >> lockup if i have ever had two ssh sessions open to the systems. And >> about half the time I can't trigger it at all but after a reset of the >> system it triggers with just hitting the return key 2-5 times of opening >> a ssh session and just hitting the return key. But please take this part >> with a grain of salt as it's done by the monkey testing method :-) >> >> All tests above have been run base on c18bb396d3d261eb ("Merge >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). >> >>> >>>> > > [snip] > >>>> >>>> I'm a bit lost on how to progress with this issue and would appreciate >>>> any help you can provide to help me figure this out. >>> >>> Can you send me your config ? >>> >>> I'm going to prepare a debug patch to spy what's happening when entering idle > > I'd like to narrow the problem a bit more with the 2 patchies aboves. Can you try > them separatly on top of c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) > and check if one of them fixes the problem ?i > > (They should apply on linux-next as well) > > First patch always kick ilb instead of doing ilb on local cpu before entering idle > > --- > kernel/sched/fair.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..b21925b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9739,8 +9739,7 @@ static void nohz_newidle_balance(struct rq *this_rq) > * candidate for ilb instead of waking up another idle CPU. > * Kick an normal ilb if we failed to do the update. > */ > - if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE)) > - kick_ilb(NOHZ_STATS_KICK); > + kick_ilb(NOHZ_STATS_KICK); > raw_spin_lock(&this_rq->lock); > } > > I tested both patches, with both of them the issue still occurs. However, on top of linux-next from yesterday I have the impression that it happens less frequent with the second patch. On top of the commit mentioned by you I don't see a change in system behavior with either patch. Regards, Heiner