Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4171393rdb; Mon, 11 Dec 2023 10:47:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IHrcRxb1MbP7TwC7N7TcL1PBgfeXCziRjMkC75pTi/uQgquvRyXifkolX53D2/RE9cPYmcu X-Received: by 2002:a17:902:bf06:b0:1d0:a663:20f5 with SMTP id bi6-20020a170902bf0600b001d0a66320f5mr1628554plb.67.1702320470406; Mon, 11 Dec 2023 10:47:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702320470; cv=none; d=google.com; s=arc-20160816; b=ee87bicSdvl9fBAUAVLCdmr6w4YMKwlgMeHd14KUUSKa0JXLZWpVtjMUPpqsARxhnO xzxGuhaK53KaKL9GkGedEaxmx19B+q4353UBwavNL/Cuq8e6LijmxT8dMb01f0gQdCrW uJG0shbNN/BQCq6Odqiibf7LE4pfLjRZlZr4nIpuaLjnKcHQk8VZ/pfLU2SZekv+Abvu BQ0DaYyiwtaAPHWOQWBs9kyAYr3sJbMhckbBapajw98UYPqkEuE+7U17Q39Fepo33U2F s/pZqnrtIaL5M1OFUXSBwlBb2HF5XL5pQL+PgXbnsOM7GTiUL1NPUbQz5TjHvDSFRpcK T52A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=/6ELt1HGXdsMNbmK8Bk16Xa+ZM6OqXO+ITRgXHi3daY=; fh=ON4JzmXg6a5T5GFdZKxoXvdL13Q2Vr5BwMdHhPFHnZU=; b=oBE0zsjVEEppK7UDeoREjCa3kHeyRe17AKTDZPFdiVDywFXdOw4DSwaQMfZe28CiBe oTTOVf3lHvhDwKSx6BBVB7zA0vvcRd6mz5tU5gMm3QVTDWyuE6KOFp2x7UPK7d7gofMv /8rL/0xZsjeCv20sMTlK/BI4FEIYSs1UcH9BTiBt0Y4wI9etIy3GgpfqGkf2FjT+W5lF kzND4XirJDWHGM2OIeCly9c6az0rwpBolS9JQTMo0X2EbBufxPCMQjpZzinTlARoHAsf fL33lD3xLvVR4s7lcYHrQfspZ/L3xf6KtrJBkwPlb+Nu4VQVfV0KvCemVWijvAaCKIsz 6xFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id i11-20020a17090332cb00b001d0c1ce3095si6530146plr.151.2023.12.11.10.47.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 10:47:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B358C80ACFEA; Mon, 11 Dec 2023 10:47:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345195AbjLKSra (ORCPT + 99 others); Mon, 11 Dec 2023 13:47:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230026AbjLKSr3 (ORCPT ); Mon, 11 Dec 2023 13:47:29 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 528A9B8; Mon, 11 Dec 2023 10:47:35 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 917511007; Mon, 11 Dec 2023 10:48:21 -0800 (PST) Received: from [10.57.75.23] (unknown [10.57.75.23]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E938F3F762; Mon, 11 Dec 2023 10:47:31 -0800 (PST) Message-ID: <739492e4-b9a3-4c55-82e6-60b02d489c5f@arm.com> Date: Mon, 11 Dec 2023 18:47:29 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/4] sched/fair: Be less aggressive in calling cpufreq_update_util() To: Qais Yousef , Ingo Molnar , Peter Zijlstra , "Rafael J. Wysocki" , Viresh Kumar , Vincent Guittot , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Lukasz Luba , Wei Wang , Rick Yiu , Chung-Kai Mei , Hongyan Xia References: <20231208015242.385103-1-qyousef@layalina.io> <20231208015242.385103-2-qyousef@layalina.io> Content-Language: en-US From: Christian Loehle In-Reply-To: <20231208015242.385103-2-qyousef@layalina.io> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 11 Dec 2023 10:47:47 -0800 (PST) On 08/12/2023 01:52, Qais Yousef wrote: > Due to the way code is structured, it makes a lot of sense to trigger > cpufreq_update_util() from update_load_avg(). But this is too aggressive > as in most cases we are iterating through entities in a loop to > update_load_avg() in the hierarchy. So we end up sending too many > request in an loop as we're updating the hierarchy. If this is actually less aggressive heavily depends on the workload, I can argue the patch is more aggressive, as you call cpufreq_update_util at every enqueue and dequeue, instead of just at enqueue. For an I/O workload it is definitely more aggressive, see below. > > Combine this with the rate limit in schedutil, we could end up > prematurely send up a wrong frequency update before we have actually > updated all entities appropriately. > [SNIP] > @@ -6704,14 +6677,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > */ > util_est_enqueue(&rq->cfs, p); > > - /* > - * If in_iowait is set, the code below may not trigger any cpufreq > - * utilization updates, so do it here explicitly with the IOWAIT flag > - * passed. > - */ > - if (p->in_iowait) > - cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT); > - > for_each_sched_entity(se) { > if (se->on_rq) > break; > @@ -6772,6 +6737,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) > enqueue_throttle: > assert_list_leaf_cfs_rq(rq); > > + cpufreq_update_util(rq, p->in_iowait ? SCHED_CPUFREQ_IOWAIT : 0); > + > hrtick_update(rq); > } > > @@ -6849,6 +6816,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) > > dequeue_throttle: > util_est_update(&rq->cfs, p, task_sleep); > + cpufreq_update_util(rq, 0); This is quite critical, instead of only calling the update at enqueue (with SCHED_CPUFREQ_IOWAIT if applicable) it is now called at every enqueue and dequeue. The only way for schedutil (intel_pstate too?) to build up a value of iowait_boost > 128 is a large enough rate_limit_us, as even for just a in_iowait task the enqueue increases the boost and its own dequeue could reduce it already. For just a basic benchmark workload and 2000 rate_limit_us this doesn't seem to be that critical, anything below 200 rate_limit_us didn't show any iowait boosting > 128 anymore on my system. Of course if the workload does more between enqueue and dequeue (time until task issues next I/O) already larger values of rate_limit_us will disable any significant iowait boost benefit. Just to add some numbers to the story: fio --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 fio --time_based --name=fiotest --filename=/dev/mmcblk2 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 All results are sorted: With this patch and rate_limit_us=2000: (Second line is without iowait boosting, results are sorted): [3883, 3980, 3997, 4018, 4019] [2732, 2745, 2782, 2837, 2841] /dev/mmcblk2 [4136, 4144, 4198, 4275, 4329] [2753, 2975, 2975, 2975, 2976] Without this patch and rate_limit_us=2000: [3918, 4021, 4043, 4081, 4085] [2850, 2859, 2863, 2873, 2887] /dev/mmcblk2 [4277, 4358, 4380, 4421, 4425] [2796, 3103, 3128, 3180, 3200] With this patch and rate_limit_us=200: /dev/nvme0n1 [2470, 2480, 2481, 2484, 2520] [2473, 2510, 2517, 2534, 2572] /dev/mmcblk2 [2286, 2338, 2440, 2504, 2535] [2360, 2462, 2484, 2503, 2707] Without this patch and rate_limit_us=200: /dev/nvme0n1 [3880, 3956, 4010, 4013, 4016] [2732, 2867, 2937, 2937, 2939] /dev/mmcblk2 [4783, 4791, 4821, 4855, 4860] [2653, 3091, 3095, 3166, 3202] I'm currently working on iowait boosting and seeing where it's actually needed and how it could be improved, so always interested in anyone's thoughts. (The second line here doesn't provide additional information, I left it in to compare for reproducibility). All with CONFIG_HZ=100 on an rk3399. Best Regards, Christian > [SNIP]