Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp441816pxj; Thu, 17 Jun 2021 06:21:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXpThU/CIjNV4ZwDvoYc75VYB8GvUScor7BMLuJKgQBp5TyxRKrK6sZSTj5bTjLJB5c3Fl X-Received: by 2002:a05:6638:634:: with SMTP id h20mr4815165jar.14.1623936086479; Thu, 17 Jun 2021 06:21:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623936086; cv=none; d=google.com; s=arc-20160816; b=hXdx1bWX0zFrgJqYylT2c24k9c/XvU69dKDdKc8weBpxFSulRDTRo9FVUAvvlcJM4+ R1rQ8fxEarH/kCfTvHuLp0Df9I0hp7XnvjbUM6M2PbpdMjXRBOHMrMEyVR9tzRy7NTar eUg9Rl4lCdjvITPK875XwDzAX+JkCJKCOssZyEYcj0aJWnb46KXKSI83IBMtOKuwGfQv XU/e9r1wwXdYj2xxLKP9wIao8Uy3ImETdy1wfwajIfSn6c1wHf9ZX9rHSpXks/PGNLom ey7obZD25zlWBugLD/QVxDnoxku1qFzaQkoRtShT06qoOTkvgJAKsjJ+DpuYgvy7AiYx UQiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=RzvbSMtjGWLOhXQr+Qh9HYS+NbNbvG497pfU079qqOM=; b=uhwFtPdIVI6RdvHZeRFv35JRYi/SUqk8yqsWd2OnhuK7aSMpMcp8zTbe+ABbrz8BBf EEYwN7YydL3HNd+idiGm9JHkhJdGyhe+7JA9ixI8wmdXWMg9YeW1wGjaBrhj+rQbvzKk nrJyvPzRhqlkdsYQXaY5VZJqUyLtCZMiGx+7FByeweEFpf3IKdNjlE8VCComNdWH6nCy 56yl/nIg0FdceZNi9EuPUlN4mrdfLTjIJFf95T6VZlKfay47b+9dSFktmK+M/VrQVCVL ObWqB2sMg6zniXkx9uYfVVwi/Yez2uMq2q3057AQIpDmZZeQFf+UhEPwUlIU72kxenFx 6Tyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i26si5225433iol.73.2021.06.17.06.21.14; Thu, 17 Jun 2021 06:21:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232326AbhFQLIY (ORCPT + 99 others); Thu, 17 Jun 2021 07:08:24 -0400 Received: from outbound-smtp18.blacknight.com ([46.22.139.245]:40491 "EHLO outbound-smtp18.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232322AbhFQLIB (ORCPT ); Thu, 17 Jun 2021 07:08:01 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp18.blacknight.com (Postfix) with ESMTPS id 981F41C3ADC for ; Thu, 17 Jun 2021 12:05:49 +0100 (IST) Received: (qmail 19741 invoked from network); 17 Jun 2021 11:05:49 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.255]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 17 Jun 2021 11:05:49 -0000 Date: Thu, 17 Jun 2021 12:05:48 +0100 From: Mel Gorman To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Valentin Schneider , LKML Subject: Re: [PATCH v2] sched/fair: Age the average idle time Message-ID: <20210617110548.GN30378@techsingularity.net> References: <20210615111611.GH30378@techsingularity.net> <20210615204228.GB4272@worktop.programming.kicks-ass.net> <20210617074401.GL30378@techsingularity.net> <20210617094040.GM30378@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 17, 2021 at 12:02:56PM +0200, Vincent Guittot wrote: > > > > > > > > Fundamentally though, as the changelog notes "due to the nature of the > > > > patch, this is a regression magnet". There are going to be examples > > > > where a deep search is better even if a machine is fully busy or > > > > overloaded and examples where cutting off the search is better. I think > > > > it's better to have an idle estimate that gets updated if CPUs are fully > > > > busy even if it's not a universal win. > > > > > > Although I agree that using a stall average idle time value of local > > > is not good, I'm not sure this proposal is better. The main problem is > > > that we use the avg_idle of the local CPU to estimate how many times > > > we should loop and try to find another idle CPU. But there is no > > > direct relation between both. > > > > This is true. The idle time of the local CPU is used to estimate the > > idle time of the domain which is inevitably going to be inaccurate but > > I'm more and more convinced that using average idle time (of the > local cpu or the full domain) is not the right metric. In > select_idle_cpu(), we looks for an idle CPU but we don't care about > how long it will be idle. Can we predict that accurately? cpufreq for intel_pstate used to try something like that but it was a bit fuzzy and I don't know if the scheduler could do much better. There is some idle prediction stuff but it's related to nohz which does not really help us if a machine is nearly fully busy or overloaded. I guess for tracking idle that revisiting https://lore.kernel.org/lkml/1615872606-56087-1-git-send-email-aubrey.li@intel.com/ is an option now that the scan is somewhat unified. A two-pass scan could be used to check potentially idle CPUs first and if there is sufficient search depth left, scan other CPUs. There were some questions on how accurate the idle mask was and how expensive it was to maintain. Unfortunately, it would not help with scan depth calculations, it just might reduce useless scanning. Selecting based on avg idle time could be interesting but hazardous. If for example, we prioritised selecting a CPU that is mostly idle, it'll also pick CPUs that are potentially in a deep idle state incurring a larger wakeup cost. Right now we are not much better because we just select an idle CPU and hope for the best but always targetting the most idle CPU could have problems. There would also be the cost of tracking idle CPUs in priority order. It would eliminate the scan depth cost calculations but the overall cost would be much worse. Hence, I still think we can improve the scan depth costs in the short term until a replacement is identified that works reasonably well. > Even more, we can scan all CPUs whatever the > avg idle time if there is a chance that there is an idle core. > That is an important, but separate topic. It's known that the idle core detection can yield false positives. Putting core scanning under SIS_PROP had mixed results when we last tried but things change. Again, it doesn't help with scan depth calculations. > > tracking idle time for the domain will be cache write intensive and > > potentially very expensive. I think this was discussed before but maybe > > it is my imaginaction. > > > > > Typically, a short average idle time on > > > the local CPU doesn't mean that there are less idle CPUs and that's > > > why we have a mix a gain and loss > > > > > > > Can you evaluate if scanning proportional to cores helps if applied on > > top? The patch below is a bit of pick&mix and has only seen a basic build > > I will queue it for some test later today > Thanks. The proposed patch since passed a build and boot test, performance evaluation is under way but as it's x86 and SMT2, I'm mostly just checking that it's neutral. -- Mel Gorman SUSE Labs