Received: by 10.192.165.148 with SMTP id m20csp2224725imm; Thu, 26 Apr 2018 07:44:27 -0700 (PDT) X-Google-Smtp-Source: AIpwx48GziTCIZ1Yy38hTKr6xXX6OsE+rtKD3bCVARqU8v+BglXlFSg1wq2kENYgLyyk98Xe5kCu X-Received: by 2002:a17:902:7149:: with SMTP id u9-v6mr25851453plm.356.1524753867060; Thu, 26 Apr 2018 07:44:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524753867; cv=none; d=google.com; s=arc-20160816; b=iJawprPrYDL0NkKtHwQlfu7k8CDudjnQXInQ+loIL17uyi61AQ5tULSkEFY84nRPnb uqkeG2poYvmEU2orGbP1+r69EHZdqgHm5OJLvzr7smDOTE55uHb0xIQAXCkRQJ5iCoYk dADst6EARG5XVwhylgeeDNJSUP2FdnPGPkd2e+yfPNQgd+Y9Eg4CsenZPHG1erdZVH0J mr6y73Sx7Hvzvc8ZVM7D2LxjCAm4xAuaJipYZUDM1krwQN+r4Haseit0qEMwkKSHCPuP P9hAUfSFcTttAIPQL2pwDJEPZv7qErmt4rSTEGrO+LJbJPRld51D+lBE/i/3b5zxuB9M P7BQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=29cPihU8TOfKTu1JrXhImKPlWCBkVAYAAVAbn9j7Nlw=; b=es+U13fAwP4rBMe8hsjg4PTU88h5F/4s27w5FZ1ivXE3RK6bEWJFnd9LxAjtW8c/ea Sd2IMDiYH/ieiwoUwBJHYiIFrL64Ib3fY314EWKT6hOFdkYN55kbQPaPEzeATsL+RSuL 3LkaqKqsVBGNv9v1GwZzWuffh/WTgIzzNBJW67WYANUbgTadackQcAQNllstUwe0ZxCu M/hPykJnfZeUwPMW28HVS3i1abDn0DlAF9z8eZFx//dsbQdBa0KrUiqW5Ss/5JjykBvB Ymf/7HwVX5rgTIJfWOfljX0NS8bT3kultFzHBVMdi5WQflTbji7aN+WxeWgwCNGw1PZQ DaNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ragnatech-se.20150623.gappssmtp.com header.s=20150623 header.b=dk+FrKJP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a33-v6si15737378pli.275.2018.04.26.07.44.13; Thu, 26 Apr 2018 07:44:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ragnatech-se.20150623.gappssmtp.com header.s=20150623 header.b=dk+FrKJP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932092AbeDZOlq (ORCPT + 99 others); Thu, 26 Apr 2018 10:41:46 -0400 Received: from mail-lf0-f68.google.com ([209.85.215.68]:46042 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755698AbeDZOll (ORCPT ); Thu, 26 Apr 2018 10:41:41 -0400 Received: by mail-lf0-f68.google.com with SMTP id q5-v6so31091806lff.12 for ; Thu, 26 Apr 2018 07:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ragnatech-se.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=29cPihU8TOfKTu1JrXhImKPlWCBkVAYAAVAbn9j7Nlw=; b=dk+FrKJPHWaF6aDLSbtzAc8jVjAxLycwF6N3zpxjnowsuE9wrBuUlStbdbwwz8uXWG o6B1HSwlthBH8x4+haNNYrCGfXsiqojnwjEasNP95bd+mpJJu6Qamfa210t4n0yjAtyN bt9Z2oF6qW36PDN+yu8ioTV3bPo0tcb80kA8rgsz7EYcvgH9VRQknCLklanPModzpbMH YFba2VqP++XwbATHg2pGjCXdI+nJ0OOdPm4A5BHa/2yq4qcqgVSC0ClZjLzKhntHLwWB yDsdEUuC8W7xmkGOootbPeOK2p1XhxYjLiy3jKYUDYIW62Ii5xQSf1xd71GT6thCLTYY f1bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=29cPihU8TOfKTu1JrXhImKPlWCBkVAYAAVAbn9j7Nlw=; b=c7fCfFgv2jZon+AHwepImHfZcBNcZ7Uad6bpvD5lxjP75nmh4tpduD3H1mDeXD1gAM SFzeJkyD0NyOEH4Sr8cG9X+TZL3dmVsVJ/yJ62OYNzhV4WGfv4+/6X/j0bjEdG4L9BZl +sIw/X8Kcm5JVyNfs4ma2f0tnv3/0PMgvSucO/23OKJUndWD1e00TU1lhfooj7yi8WXD zbTF8YktZ8kVi/bPk2tSTza8HrQAf3QkA0STJlP55v2FiCQSERNSyBjPNTCp7EdrN5Sf WGAqQ6SyYKKJREBb4GT+YqsjqXk4wzMdaRa0BxO6pruikyazTjjNVfHaI/P6+CfFmKnI tW+w== X-Gm-Message-State: ALQs6tBVaSoaj2luhxOb8G3mJN7MmIS0nlybqvSCACcESrfOcGU3uVDO 9IZtN9GNynk/ceX2p5R+VqY2pg== X-Received: by 2002:a19:921a:: with SMTP id u26-v6mr16630001lfd.112.1524753699990; Thu, 26 Apr 2018 07:41:39 -0700 (PDT) Received: from localhost (89-233-230-99.cust.bredband2.com. [89.233.230.99]) by smtp.gmail.com with ESMTPSA id y18sm829944ljc.52.2018.04.26.07.41.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Apr 2018 07:41:38 -0700 (PDT) Date: Thu, 26 Apr 2018 16:41:38 +0200 From: Niklas =?iso-8859-1?Q?S=F6derlund?= To: Vincent Guittot Cc: Heiner Kallweit , Peter Zijlstra , "Paul E. McKenney" , Ingo Molnar , linux-kernel , linux-renesas-soc@vger.kernel.org Subject: Re: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Message-ID: <20180426144137.GC3315@bigcity.dyn.berto.se> References: <20180412111519.GH12256@bigcity.dyn.berto.se> <20180412133031.GA551@linaro.org> <20180412223904.GJ12256@bigcity.dyn.berto.se> <20180420160013.GA13769@linaro.org> <20180422221827.GB27674@bigcity.dyn.berto.se> <20180423095420.GA23995@linaro.org> <20180425225603.GA26177@bigcity.dyn.berto.se> <20180426103133.GA6953@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180426103133.GA6953@linaro.org> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vincent, Thanks for all your help. On 2018-04-26 12:31:33 +0200, Vincent Guittot wrote: > Hi Niklas, > > Le Thursday 26 Apr 2018 ? 00:56:03 (+0200), Niklas S?derlund a ?crit : > > Hi Vincent, > > > > Here are the result, sorry for the delay. > > > > On 2018-04-23 11:54:20 +0200, Vincent Guittot wrote: > > > > [snip] > > > > > > > > Thanks for the report. Can you re run with the following trace-cmd sequence ? My previous sequence disables ftrace events > > > > > > trace-cmd reset > /dev/null > > > trace-cmd start -b 40000 -p function -l dump_backtrace:traceoff -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk > > > trace-cmd start -b 40000 -p function -l dump_backtrace -e sched -e cpu_idle -e cpu_frequency -e timer -e ipi -e irq -e printk > > > > > > I have updated the patch and added traces to check that scheduler returns from idle_balance function and doesn't stay stuck > > > > Once more I applied the change bellow on-top of c18bb396d3d261eb ("Merge > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). > > > > This time the result of 'trace-cmd report' is so large I do not include > > it here, but I attach the trace.dat file. Not sure why but the timing of > > sending the NMI to the backtrace print is different (but content the > > same AFIK) so in the odd change it can help figure this out: > > > > Thanks for the trace, I have been able to catch a problem with it. > Could you test the patch below to confirm that the problem is solved ? > The patch apply on-top of > c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") I can confirm that with the patch bellow I can no longer produce the problem. Thanks! > > From: Vincent Guittot > Date: Thu, 26 Apr 2018 12:19:32 +0200 > Subject: [PATCH] sched/fair: fix the update of blocked load when newly idle > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > With commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle"), > we release the rq->lock when updating blocked load of idle CPUs. This open > a time window during which another CPU can add a task to this CPU's cfs_rq. > The check for newly added task of idle_balance() is not in the common path. > Move the out label to include this check. > > Fixes: 31e77c93e432 ("sched/fair: Update blocked load when newly idle") > Reported-by: Heiner Kallweit > Reported-by: Niklas S?derlund > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..15a9f5e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9847,6 +9847,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf) > if (curr_cost > this_rq->max_idle_balance_cost) > this_rq->max_idle_balance_cost = curr_cost; > > +out: > /* > * While browsing the domains, we released the rq lock, a task could > * have been enqueued in the meantime. Since we're not going idle, > @@ -9855,7 +9856,6 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf) > if (this_rq->cfs.h_nr_running && !pulled_task) > pulled_task = 1; > > -out: > /* Move the next balance forward */ > if (time_after(this_rq->next_balance, next_balance)) > this_rq->next_balance = next_balance; > -- > 2.7.4 > > > > [snip] > -- Regards, Niklas S?derlund