Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp95978pxw; Fri, 8 Apr 2022 02:04:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyo248apLc9eSRgou3k14xfOnvEyxy4scrazGf0tvg1hA4p8zwD8c2mJes4t1zjB+K065mu X-Received: by 2002:a05:6402:1941:b0:413:2b7e:676e with SMTP id f1-20020a056402194100b004132b7e676emr18773737edz.114.1649408648914; Fri, 08 Apr 2022 02:04:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649408648; cv=none; d=google.com; s=arc-20160816; b=cQ1Zn6arWPmZkm0Ehr5cD8wriin48pBJFOf2S8fEF9kDzwUvbgYsGRVRqWNRRS+rEf mrQVU5kOxTHAz8hUZcZrzR64s/Q8QU58LoRF3L7hl3fQk79X18PaaJO3TLqLiXnV3VLn PXvJkgqvGWEg219ogyemxytZkRiAjtxbLpHKxyIFBvmLYziHifBQ2lAS3Bx8GG3b8suS +tVWMivkNCEvoAzbOKZ7EVaBUsPitRhtxs2khQzPm+Ui3EJItTCt7Sf5+1J3OmGOla9x JZVoKZud2i4WQ9Sb1VXzSZzdugnVzqOwoLYexXPXP6mz/iNtVPfjnSwR+234BCmAY309 jGcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=p4al6XE4mZf5FhF4BRGqvM8RxBNfhw6xzIpON9k8hEs=; b=ggp6/VA4fiGAj1qehZWfH8QJr+pc0FG+AGL4+yUinUADi1F56zi1wxh2WNRi/ThR6r GEhHeg5qDzv14NVNxjDFFDFsWdp8d1JXbM52YH4athACFAwj5j98OHgqZjTuceTc26Ea CiKQgyZHMwTEj3JPG/gxgTxWj5RV0pnHPQq/wqoUaBMw/qvLlnpA8IE8pZjuvEZJfMl1 ni+vd8E7i3g+Csvqi8SQKt5EsrCQ00Y3Cfb3O9BtvCM3nQj60jEgm+5IlbHxl/rirwVu HlfhpJ/4o7wemSNXdeuaY4/bokD7rTy+I9h93NzA5xv/Ze7jgI9yTx942qvutcUiANt9 E1CQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=BtTKjOwV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l7-20020a17090615c700b006e805c27bcbsi506650ejd.855.2022.04.08.02.03.34; Fri, 08 Apr 2022 02:04:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=BtTKjOwV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231645AbiDHIYl (ORCPT + 99 others); Fri, 8 Apr 2022 04:24:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231613AbiDHIYj (ORCPT ); Fri, 8 Apr 2022 04:24:39 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43BA713987B; Fri, 8 Apr 2022 01:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=p4al6XE4mZf5FhF4BRGqvM8RxBNfhw6xzIpON9k8hEs=; b=BtTKjOwVnmC70zpA2wmxiLVUrY 2GqbJP2uJ5AnaVEFYp32h6BnHTvF/k8v3TC654jmKq6R0KCIgW3APXRpPoSrDLQfSRrhzEQzKZ2SH 1q602T47W5Xf4Yhjzu8b6CrL/q2WbF+9y9B7jrRJTnsselr3QZVjQoQIyrc5fnX3tBVIvWZs8K6KD zS7H38MJcIMF5d8P/BPtybsrmyWM2qxsld4E+gf+x6qmX/7XlmOh8ugR1lbXcf2X40QbiS+owubQX i9qiOqF6ZpEKadSVPXr4fZIs3HEjd+sVbq00oXGnZtsls0xus6l2jCMBQCYOMgywq8RPqAwY1N7Zu 5lF00ONg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1ncjsw-009deL-Dk; Fri, 08 Apr 2022 08:22:26 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id BFB149862CF; Fri, 8 Apr 2022 10:22:25 +0200 (CEST) Date: Fri, 8 Apr 2022 10:22:25 +0200 From: Peter Zijlstra To: Chen Yu Cc: linux-pm@vger.kernel.org, "Rafael J. Wysocki" , Srinivas Pandruvada , Vincent Guittot , Len Brown , Tim Chen , Giovanni Gherdovich , Chen Yu , linux-kernel@vger.kernel.org, Zhang Rui Subject: Re: [PATCH] cpufreq: intel_pstate: Handle no_turbo in frequency invariance Message-ID: <20220408082225.GN2731@worktop.programming.kicks-ass.net> References: <20220407234258.569681-1-yu.c.chen@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220407234258.569681-1-yu.c.chen@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 08, 2022 at 07:42:58AM +0800, Chen Yu wrote: > Problem statement: > Once the user has disabled turbo frequency by > echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo, > the cfs_rq's util_avg becomes quite small when compared with > CPU capacity. > > Step to reproduce: > > echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo > > ./x86_cpuload --count 1 --start 3 --timeout 100 --busy 99 > would launch 1 thread and bind it to CPU3, lasting for 100 seconds, > with a CPU utilization of 99%. [1] > > top result: > %Cpu3 : 98.4 us, 0.0 sy, 0.0 ni, 1.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st > > check util_avg: > cat /sys/kernel/debug/sched/debug | grep "cfs_rq\[3\]" -A 20 | grep util_avg > .util_avg : 611 > > So the util_avg/cpu capacity is 611/1024, which is much smaller than > 98.4% shown in the top result. > > This might impact some logic in the scheduler. For example, group_is_overloaded() > would compare the group_capacity and group_util in the sched group, to > check if this sched group is overloaded or not. With this gap, even > when there is a nearly 100% workload, the sched group will not be regarded > as overloaded. Besides group_is_overloaded(), there are also other victims. > There is a ongoing work that aims to optimize the task wakeup in a LLC domain. > The main idea is to stop searching idle CPUs if the sched domain is overloaded[2]. > This proposal also relies on the util_avg/CPU capacity to decide whether the LLC > domain is overloaded. > > Analysis: > CPU frequency invariance has caused this difference. In summary, > the util_sum of cfs rq would decay quite fast when the CPU is in > idle, when the CPU frequency invariance is enabled. > > The detail is as followed: > > As depicted in update_rq_clock_pelt(), when the frequency invariance > is enabled, there would be two clock variables on each rq, clock_task > and clock_pelt: > > The clock_pelt scales the time to reflect the effective amount of > computation done during the running delta time but then syncs back to > clock_task when rq is idle. > > absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16 > @ max frequency ------******---------------******--------------- > @ half frequency ------************---------************--------- > clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16 > > The fast decay of util_sum during idle is due to: > 1. rq->clock_pelt is always behind rq->clock_task > 2. rq->last_update is updated to rq->clock_pelt' after invoking ___update_load_sum() > 3. Then the CPU becomes idle, the rq->clock_pelt' would be suddenly increased > a lot to rq->clock_task > 4. Enters ___update_load_sum() again, the idle period is calculated by > rq->clock_task - rq->last_update, AKA, rq->clock_task - rq->clock_pelt'. > The lower the CPU frequency is, the larger the delta = > rq->clock_task - rq->clock_pelt' will be. Since the idle period will be > used to decay the util_sum only, the util_sum drops significantly during > idle period. > > Proposal: > This symptom is not only caused by disabling turbo frequency, but it > would also appear if the user limits the max frequency at runtime. Because > if the frequency is always lower than the max frequency, > CPU frequency invariance would decay the util_sum quite fast during idle. > > As some end users would disable turbo after boot up, this patch aims to > present this symptom and deals with turbo scenarios for now. It might > be ideal if CPU frequency invariance is aware of the max CPU frequency > (user specified) at runtime in the future. > > [Previous patch seems to be lost on LKML, this is a resend, sorry for any > inconvenience] > > Link: https://github.com/yu-chen-surf/x86_cpuload.git #1 > Link: https://lore.kernel.org/lkml/20220310005228.11737-1-yu.c.chen@intel.com/ #2 > Signed-off-by: Chen Yu Acked-by: Peter Zijlstra (Intel)