Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2504722imm; Thu, 23 Aug 2018 23:56:35 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZdRs1dBLS1nJoOU3Vn6ZuwY9FVMZbIoBaweJ7FSwKSC074a4fLZapGqMjhPIVgXs2u0hkJ X-Received: by 2002:a62:c406:: with SMTP id y6-v6mr461384pff.161.1535093795066; Thu, 23 Aug 2018 23:56:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535093795; cv=none; d=google.com; s=arc-20160816; b=qQd8SpPJa8G25PQI7m2pO5qf8U8qXFcava/TemVoIcbncv/G5RtdXVtvn41LcJpUwI SMY5yJJdscEs2oifw5+FQDI0yXFWf6oM5BU8iPRB0IEzIbedLV97q1i8JajEuQTBX+TO JthSwDJNNCS0PbH+X15VtBQgw1/2D9qBn9bYWbYXllUzlEHOoPCKh7bF7sDtWz1gJ8Bq 7gglUSsV0UUuJRfu+Sj274+3BH6IKZKKzn9Fa+9sj5JNVvw/jDg75MzN3ys9xd4BrajQ Ue3+O8bb6AZ8lPfTDWnnByE2Ww5twVjJQAR3SHvBfIn6RKCQdl77DgLQug/d1uswp/69 5fPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=81v6UpEO0Oh1hU3LECqdvR5z0yh4KjR95I6b0S9apJo=; b=SkzWSVEKgqJfTwhqnuqkaozpsY1T5MbO8lOVv/3mrEXPYbiXncPk+xSXzIhGW8veY8 zVeKSKoAwgl3EKZ1htG6WBHN6Jq19CsZbfo4DYhXW4/PaDkC4/dgE1/ze8rwZem96qUa FZihG4fLs+84KWLKycrOYmj3LB5bppQB43a3GJx0J2b8tZs+Mu+0/Pd+3rPUqOBxMvAR qwlaR6JUdEKQtClnsdzW6QhZjuzEulOi6PB4us9wLAQi/XcCNrSRxT/Qj/TJ4iOCDHwt a4D5S8yZ+6ddShruJyvutZyFz8xfC9OCOYZqKhfN/UWY+d7NZvmi5FYIlxTuGEaRKDeG m/Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XPizOp+s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f13-v6si5891747pgd.571.2018.08.23.23.56.19; Thu, 23 Aug 2018 23:56:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XPizOp+s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727490AbeHXK1l (ORCPT + 99 others); Fri, 24 Aug 2018 06:27:41 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:43310 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726455AbeHXK1l (ORCPT ); Fri, 24 Aug 2018 06:27:41 -0400 Received: by mail-wr1-f68.google.com with SMTP id k5-v6so6538571wre.10 for ; Thu, 23 Aug 2018 23:54:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=81v6UpEO0Oh1hU3LECqdvR5z0yh4KjR95I6b0S9apJo=; b=XPizOp+sy4F5PhrwjEA8ut/Ebh+aOzz6geQIPwmEW6fVwBT6oGDJHtztkuLSRikznb UbXEQe7xT3OfakClfT4mtzwG/PNRHuXL5eckBh8oeydWSISl+yYmEw5RbUkvtZGJGG3i 8TffyEj7MZOWibdVeaIb0iMx2X9WRmymcFoivabF5TJZY9zQq/I5Z98qzGef6e3vwBDU 9z0I+/hmYQec8pBgqtUCH0lc5wOSY7Mbvi9EIvgA00YFbQgFaX27Um48Laj16/U27+cS G/2bK1RsQdlXfYE3MX8TvPpG+GsdpMHlz3Dhs1LHq3XwI7ByH7gCmrXtibDpOUFrXDgK oasA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=81v6UpEO0Oh1hU3LECqdvR5z0yh4KjR95I6b0S9apJo=; b=JPgeZoI1jCvf0bXMBjqA9+grKXi4eu3xaMVcDSDCpn3bSNQ+T+qMRo6C0R0v5W30Bo S84rvmVRCHg91ghK/RwSgv2QupXu09hBVJ/1cBXQERj1KB7FqaBz1S5v6qxFhvwwYn3O ArbE/m4xpem94wiXjuPnk65WkYjaZbNVt6v8YrFm3QzD7nA2VbQz15y6MLeYIS1UWArk h+gvjDLNLgpx1gBJw6MsHBfkEjlJXGDfcczUHLQyWXGurJXxB77iOsKpzfwylT4j/x24 pNUHVUFN9BJnjmZVspkp/VzfjoYN0F1EYOHCDnqb0nWT+rPw1YiqoJC4azg0ty4X30RD 7bjQ== X-Gm-Message-State: APzg51Buwy1IsK0YXeaFOqNJuLDfitqkvEyHhYZqIhV2XZFturX6lUv7 3TrlJKcoMcpeltWkiV6N3IY= X-Received: by 2002:adf:fa4d:: with SMTP id y13-v6mr303992wrr.155.1535093663693; Thu, 23 Aug 2018 23:54:23 -0700 (PDT) Received: from localhost.localdomain ([151.36.95.19]) by smtp.gmail.com with ESMTPSA id b2-v6sm625733wmh.3.2018.08.23.23.54.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 23 Aug 2018 23:54:23 -0700 (PDT) Date: Fri, 24 Aug 2018 08:54:19 +0200 From: Juri Lelli To: Dietmar Eggemann Cc: Miguel de Dios , Steve Muckle , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, kernel-team@android.com, Todd Kjos , Paul Turner , Quentin Perret , Patrick Bellasi , Chris Redpath , Morten Rasmussen , John Dias Subject: Re: [PATCH] sched/fair: vruntime should normalize when switching from fair Message-ID: <20180824065419.GB24860@localhost.localdomain> References: <20180817182728.76129-1-smuckle@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23/08/18 18:52, Dietmar Eggemann wrote: > Hi, > > On 08/21/2018 01:54 AM, Miguel de Dios wrote: > > On 08/17/2018 11:27 AM, Steve Muckle wrote: > > > From: John Dias > > > > > > When rt_mutex_setprio changes a task's scheduling class to RT, > > > we're seeing cases where the task's vruntime is not updated > > > correctly upon return to the fair class. > > > Specifically, the following is being observed: > > > - task is deactivated while still in the fair class > > > - task is boosted to RT via rt_mutex_setprio, which changes > > > ?? the task to RT and calls check_class_changed. > > > - check_class_changed leads to detach_task_cfs_rq, at which point > > > ?? the vruntime_normalized check sees that the task's state is > > > TASK_WAKING, > > > ?? which results in skipping the subtraction of the rq's min_vruntime > > > ?? from the task's vruntime > > > - later, when the prio is deboosted and the task is moved back > > > ?? to the fair class, the fair rq's min_vruntime is added to > > > ?? the task's vruntime, even though it wasn't subtracted earlier. > > > The immediate result is inflation of the task's vruntime, giving > > > it lower priority (starving it if there's enough available work). > > > The longer-term effect is inflation of all vruntimes because the > > > task's vruntime becomes the rq's min_vruntime when the higher > > > priority tasks go idle. That leads to a vicious cycle, where > > > the vruntime inflation repeatedly doubled. > > > > > > The change here is to detect when vruntime_normalized is being > > > called when the task is waking but is waking in another class, > > > and to conclude that this is a case where vruntime has not > > > been normalized. > > > > > > Signed-off-by: John Dias > > > Signed-off-by: Steve Muckle > > > --- > > > ? kernel/sched/fair.c | 3 ++- > > > ? 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index b39fb596f6c1..14011d7929d8 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -9638,7 +9638,8 @@ static inline bool vruntime_normalized(struct > > > task_struct *p) > > > ?????? * - A task which has been woken up by try_to_wake_up() and > > > ?????? *?? waiting for actually being woken up by sched_ttwu_pending(). > > > ?????? */ > > > -??? if (!se->sum_exec_runtime || p->state == TASK_WAKING) > > > +??? if (!se->sum_exec_runtime || > > > +??????? (p->state == TASK_WAKING && p->sched_class == > > > &fair_sched_class)) > > > ????????? return true; > > > ????? return false; > > The normalization of vruntime used to exist in task_waking but it was > > removed and the normalization was moved into migrate_task_rq_fair. The > > reasoning being that task_waking_fair was only hit when a task is queued > > onto a different core and migrate_task_rq_fair should do the same work. > > > > However, we're finding that there's one case which migrate_task_rq_fair > > doesn't hit: that being the case where rt_mutex_setprio changes a task's > > scheduling class to RT when its scheduled out. The task never hits > > migrate_task_rq_fair because it is switched to RT and migrates as an RT > > task. Because of this we're getting an unbounded addition of > > min_vruntime when the task is re-attached to the CFS runqueue when it > > loses the inherited priority. The patch above works because now the > > kernel specifically checks for this case and normalizes accordingly. > > > > Here's the patch I was talking about: > > https://lore.kernel.org/patchwork/patch/677689/. In our testing we were > > seeing vruntimes nearly double every time after rt_mutex_setprio boosts > > the task to RT. > > > > Signed-off-by: Miguel de Dios > > Tested-by: Miguel de Dios > > I tried to catch this issue on my Arm64 Juno board using pi_test (and a > slightly adapted pip_test (usleep_val = 1500 and keep low as cfs)) from > rt-tests but wasn't able to do so. > > # pi_stress --inversions=1 --duration=1 --groups=1 --sched id=low,policy=cfs > > Starting PI Stress Test > Number of thread groups: 1 > Duration of test run: 1 seconds > Number of inversions per group: 1 > Admin thread SCHED_FIFO priority 4 > 1 groups of 3 threads will be created > High thread SCHED_FIFO priority 3 > Med thread SCHED_FIFO priority 2 > Low thread SCHED_OTHER nice 0 > > # ./pip_stress > > In both cases, the cfs task entering rt_mutex_setprio() is queued, so > dequeue_task_fair()->dequeue_entity(), which subtracts cfs_rq->min_vruntime > from se->vruntime, is called on it before it gets the rt prio. > > Maybe it requires a very specific use of the pthread library to provoke this > issue by making sure that the cfs tasks really blocks/sleeps? Maybe one could play with rt-app to recreate such specific use case? https://github.com/scheduler-tools/rt-app/blob/master/doc/tutorial.txt#L459 Best, - Juri