Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1337698ybt; Thu, 25 Jun 2020 03:45:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwGk3TAn60iQT7EaRU4MIVbR0WgWFYu3MZQSaLtzbyrHXjL6Hnmy7iwl9SX7lmf66S8zA3r X-Received: by 2002:a05:6402:1604:: with SMTP id f4mr32378804edv.379.1593081907463; Thu, 25 Jun 2020 03:45:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593081907; cv=none; d=google.com; s=arc-20160816; b=RFD8Jd73WPsjYbGUtM8wj2rZrGYaxXCev4lxHXxk1f4sxHEnVVqPiqMNlLYBZSDTzY GletVg3eGIEFwWAHquN+SmgFgM5yLhZoDyrx3PBcbtNrgRXprYqqxhpvCZUir3Ua+fn3 iY51mdLuccKnGNRxmA96ajw7p+n7ky4KkEN5F38uohrORBo3imQQ08A5tE/cxRxlRv58 LWo/kxci2q/SAaJW1hf/P8Fgzwwk25FCruGHZct7O6udBQ6NYZVNEOswmgY3yooaQkJR RKVgU/QReM6jTkaAmLK3Djuch8kNAYj3z79jJ8MeGUKjq9qdvIECkoBtwI9MV+q6BBZC k8xQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:date:message-id :organization:from:references:cc:to:subject; bh=D7JtjrCyQnrKQ6ahhm5XcDOt6Lib9LJZ33heAHbmCbE=; b=PkFPwIL9U0EZ9UKS/b9K908WwgctR2/iKcGxd+zXTisNE7gUqG6m8qRkFzn4RLUQtT Zf0PeB7J528ZB/WrylBZLcfvwKHnJar6/N0w9hO/sKpDyNeXju2kdjfRIf9wCxJJw5te tt2v9yX9C0CNzyk/KwQi/jLAvfd36LNewZSOW/Eyh9NJHOOW5LSt9o+5o3FjTmYui7ED p9HbJF2OaQyXPDgSvZ+GPFAwnxLWfhG+7akD9vL0VzNO4ZBWjAXKGvQ+GaZ/IQcE5Dt5 IEduS+0bVOhz4QuVJyTxx1FefhXgCDURYXFTCz4TIGhaTCWzxcYge14dcciq6epPTS6x +Nvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n7si14768062ejk.154.2020.06.25.03.44.44; Thu, 25 Jun 2020 03:45:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404050AbgFYKmc (ORCPT + 99 others); Thu, 25 Jun 2020 06:42:32 -0400 Received: from mail.itouring.de ([188.40.134.68]:35408 "EHLO mail.itouring.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404000AbgFYKmc (ORCPT ); Thu, 25 Jun 2020 06:42:32 -0400 Received: from tux.applied-asynchrony.com (p5ddd79e0.dip0.t-ipconnect.de [93.221.121.224]) by mail.itouring.de (Postfix) with ESMTPSA id AF248416080F; Thu, 25 Jun 2020 12:42:29 +0200 (CEST) Received: from [192.168.100.223] (ragnarok.applied-asynchrony.com [192.168.100.223]) by tux.applied-asynchrony.com (Postfix) with ESMTP id 615B1F01605; Thu, 25 Jun 2020 12:42:29 +0200 (CEST) Subject: Re: [PATCH] sched/cfs: change initial value of runnable_avg To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel , kernel test robot , Valentin Schneider , Phil Auld , Hillf Danton References: <20200624154422.29166-1-vincent.guittot@linaro.org> <7f2b3135-328b-a510-ce23-49e3f5c20965@applied-asynchrony.com> From: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= Organization: Applied Asynchrony, Inc. Message-ID: Date: Thu, 25 Jun 2020 12:42:29 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-06-25 11:56, Vincent Guittot wrote: > On Thu, 25 Jun 2020 at 11:24, Holger Hoffstätte > wrote: >> >> On 2020-06-24 17:44, Vincent Guittot wrote: >>> Some performance regression on reaim benchmark have been raised with >>> commit 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") >>> >>> The problem comes from the init value of runnable_avg which is initialized >>> with max value. This can be a problem if the newly forked task is finally >>> a short task because the group of CPUs is wrongly set to overloaded and >>> tasks are pulled less agressively. >>> >>> Set initial value of runnable_avg equals to util_avg to reflect that there >>> is no waiting time so far. >>> >>> Fixes: 070f5e860ee2 ("sched/fair: Take into account runnable_avg to classify group") >>> Reported-by: kernel test robot >>> Signed-off-by: Vincent Guittot >>> --- >>> kernel/sched/fair.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 0424a0af5f87..45e467bf42fc 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -806,7 +806,7 @@ void post_init_entity_util_avg(struct task_struct *p) >>> } >>> } >>> >>> - sa->runnable_avg = cpu_scale; >>> + sa->runnable_avg = sa->util_avg; >>> >>> if (p->sched_class != &fair_sched_class) { >>> /* >>> >> >> Something is wrong here. I woke up my machine from suspend-to-RAM this morning >> and saw that a completely idle machine had a loadavg of ~7. According to my > > Just to make sure: Are you speaking about loadavg that is output by > /proc/loadavg or load_avg which is the PELT load ? /proc/loadavg >> monitoring system this happened to be the loadavg right before I suspended. >> I've reverted this, rebooted, created a loadavg >0, suspended and after wake up >> loadavg again correctly ranges between 0 and whatever, as expected. > > I'm not sure to catch why ~7 is bad compared to correctly ranges > between 0 and whatever. Isn't ~7 part of the whatever ? After wakeup the _baseline_ for loadavg seemed to be the last value before suspend, not 0. The 7 then was the base loadavg for a _mostly idle machine_ (just reading mail etc.), i.e. it never went below said baseline again, no matter the _actual_ load. Here's an image: https://imgur.com/a/kd2stqO Before 02:00 last night the load was ~7 (compiled something), then all processes were terminated and the machine was suspended. After wakeup the machine was mostly idle (9am..11am), yet measured loadavg continued with the same value as before. I didn't notice this right away since my CPU meter on the desktop didn't show any *actual* activity (because there was none). The spike at ~11am is the revert/reboot. After that loadavg became normal again, i.e. representative of the actual load, even after suspend/resume cycles. I suspend/resume every night and the only thing that changed recently was this patch, so.. :) -h