Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2542286lqp; Mon, 25 Mar 2024 01:52:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU4CMdjVwhXUCct5xi+owK/oM8AOJWYqrVx11Jk4G+ZPvQlNolBhQi3b5X7CXo4q9FYM8o8cq+iv0dPfJFE0l9RrCbbKUEZEkVVw+hGsA== X-Google-Smtp-Source: AGHT+IF1wrP5BshjuJIcpAyq8OJ14kWt+16nwIgAq9ZMUVyi2O+hK5P/LUhxr35Nv0wKnXA5g3/r X-Received: by 2002:a05:6a20:7fa1:b0:1a3:ae12:e610 with SMTP id d33-20020a056a207fa100b001a3ae12e610mr6329669pzj.35.1711356760998; Mon, 25 Mar 2024 01:52:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711356760; cv=pass; d=google.com; s=arc-20160816; b=L8BYk+P1nc42RucfeDyQF6G3XO3659/Vu8gAqICRVwNMJJJ6mQ/a07HnL18R9Teq3B 68RG0frsjT4Gz8cJ7uQfBbxyM7NFcAiBsWmLkxt+V0jSP47B6iZuoOBR0HtfofOFcROZ 980IsRpfU9ylnNcjFaokzlBaVrD3C/x1bsWk2c/XUJ2BdlUMT2qn7ClHoW9YnDwjrYV8 TzLQ5Bf+4iEBzyZWTak/yNyiqQTln4qDL6Lpb/5Y9/0fjClp0h2HYYeIcr/RThYsYqm8 8R9p9srRanyQ0W6EWvYfEWTdWecXwepja/Rw4+csc9nrnn6f3wUPgtEaKpkgwPg7f41t 3H9w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=3KYX5sN4/DQ0Ca+C7M+8tiNgEQ0vWfJgF6O4xHUhDV4=; fh=WYOoW3jNWqTCL87B0P6vpGIQWmy8cHBZyGRz57xOo5Q=; b=XlgkRaYjLrrVUtf1zFr0rDNlY+yCOwoSrGFmIC7ty9LlNdbruX4ZOo1DzfyZmzTA48 DpQNR+ihgYBLRA4TvB/hvgYvQqzNNBMbNmIcxt05a1rTD/7DxG4Fp6GNCXsq8O/jTUnL wk38yC9xdJurWqjq4ZGm7g3EvT0Q4vxdF4I4LO/6rCkODBMlMnL1LiZ0HVLhjMUYd8Q0 pb84ZYIGZ52rZCWE3TSuBAp5JwZPNpJazGvUIjDUJJSlAdUBSIXTNVmGIyDAwBkcpjNQ GI1O9fQ+xJqbmsXNSaui7dcwQPj1pHNNVVkPUQ8jwuMTSgmNd1r4MhbVzFKkJyw7TIzO t5jw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=WILBVUsc; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-116422-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116422-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id a7-20020a17090ad80700b0029bc172d30fsi7344108pjv.7.2024.03.25.01.52.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 01:52:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116422-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=WILBVUsc; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-116422-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116422-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 599F129B6FF for ; Mon, 25 Mar 2024 08:51:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4985B14B07D; Mon, 25 Mar 2024 05:53:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="WILBVUsc" Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3597614D43E for ; Mon, 25 Mar 2024 02:20:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711333257; cv=none; b=r3ur78bbcK9bmWJWSRiWoyVKJqwIYybOEjIObROv6/eKL856juXeRRyBQ0mN+ChjJO4VvtneYgiNeRC09+W+NmidT4F8Xe5l61iktoTkG4kbNS0ZZxzJOyY3sThAKIiMd/8ZZfCxAjk3ET54Fgmk/FX2qNsvHdervgnI7Odel7A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711333257; c=relaxed/simple; bh=a+5IO4bzkKk2FkrhKzrSPkAfIMWaw9+PBv6OhA8Vg0Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=f9CM3xYnsxT6hf4VAK994GvVt1JSaYCI5YGTprcpA4CrZMEk08a4Tohx1cSXdUQbJ/xrKIDL9P0lTBv1LxaTs6e567SNmO4dF56PYOKpdS7LVx3mTMYylxsgaRW/vNmAUEm/93LwJRiZAPfboMIWeBTyPRTf7HF6uXj6SepEPgY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=WILBVUsc; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-33e285a33bdso2061179f8f.2 for ; Sun, 24 Mar 2024 19:20:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1711333253; x=1711938053; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3KYX5sN4/DQ0Ca+C7M+8tiNgEQ0vWfJgF6O4xHUhDV4=; b=WILBVUschp4CuVlN7IxfSSbcekKRUxY843c89N9f2TXzUAxcWfnvMLRYTeQyhll7UU fCEgVNTdjGNu3DzFmMDtsctqbTWpbgMTn73fXjf0alNQEHwUD41hmK3tVxa4UYRJOUo+ m/jg4z0sXsuTfv4Cipozh/mGU8iH2flLHKj5ivZ7iiFqKTN9oZGGDupI7GTCaF3AsQKD d/WpSTOTL8kgJDsYCm7gB8gAlVcLID3ax6mFZVt/g0/v3Os0fepbMdbVh1OWrTCwlP9T 2Ze4VF067ykKFmEdba6UQSixO1BJ2zZ7/f2bY11gN9fFq8DU60UYMiz1wQ98k12imiTs ijYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711333253; x=1711938053; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3KYX5sN4/DQ0Ca+C7M+8tiNgEQ0vWfJgF6O4xHUhDV4=; b=CutLYpC/Mi3+FR5LqT7MnMRCqB0LQ83UuHpQLIGdsp9XulmFL7euElPDwXpUD+LnQx C1t1WWyAWfIyTMHvuBz1zbVi+4ymTsnoIVHTXuCND77A7NqPyHKggpVkjMm+/zas2pVD aUBQjyLLwod/o6IVwU9g4eUeWwtYRLp+VYQlYt41bEXAKZDIcppvL/g0hWsOPQXIC3GK 4fGyodKBOIglxQBs56yXqaHbMNZolKyAYe2KJqSlDRJEb842aHH6lUzLe51W+rzZAHzF d/dyYbCsJaprR4LswJWoj86hdO5lfl0nsLiKQgwT2dy3n2v9tVlcc3KUyvuqFKPvJJ/k dXkw== X-Forwarded-Encrypted: i=1; AJvYcCXVa+UI7lemDqV+nCNGO2XeySG6CTtOIHe9FYfWWfKU3fpQd+m10gMntKojXpUXLwH1OaA0M1scgqaqZenF5wrJYBWgi29KapxOI3Ht X-Gm-Message-State: AOJu0Yw1qEqlAklw8j+SsJWPzQRdTrlezarCrCByzO6GYptzVa8abTvD bmsjnCH/rM+baJeH63jBz5+oWU8CRnbCspS+U1pRwXJjun8NNTMLPXo6Cxcljss= X-Received: by 2002:adf:f241:0:b0:33e:78c4:3738 with SMTP id b1-20020adff241000000b0033e78c43738mr3519996wrp.54.1711333253412; Sun, 24 Mar 2024 19:20:53 -0700 (PDT) Received: from airbuntu (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id t17-20020a0560001a5100b0033dd2c3131fsm8095566wry.65.2024.03.24.19.20.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Mar 2024 19:20:52 -0700 (PDT) Date: Mon, 25 Mar 2024 02:20:51 +0000 From: Qais Yousef To: Vincent Guittot Cc: Christian Loehle , linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, mingo@redhat.com, rafael@kernel.org, dietmar.eggemann@arm.com, vschneid@redhat.com, Johannes.Thumshirn@wdc.com, adrian.hunter@intel.com, ulf.hansson@linaro.org, andres@anarazel.de, asml.silence@gmail.com, linux-pm@vger.kernel.org, linux-block@vger.kernel.org, io-uring@vger.kernel.org Subject: Re: [RFC PATCH 0/2] Introduce per-task io utilization boost Message-ID: <20240325022051.73mfzap7hlwpsydx@airbuntu> References: <20240304201625.100619-1-christian.loehle@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: (piggy backing on this reply) On 03/22/24 19:08, Vincent Guittot wrote: > Hi Christian, > > On Mon, 4 Mar 2024 at 21:17, Christian Loehle wrote: > > > > There is a feature inside of both schedutil and intel_pstate called > > iowait boosting which tries to prevent selecting a low frequency > > during IO workloads when it impacts throughput. > > The feature is implemented by checking for task wakeups that have > > the in_iowait flag set and boost the CPU of the rq accordingly > > (implemented through cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT)). > > > > The necessity of the feature is argued with the potentially low > > utilization of a task being frequently in_iowait (i.e. most of the > > time not enqueued on any rq and cannot build up utilization). > > > > The RFC focuses on the schedutil implementation. > > intel_pstate frequency selection isn't touched for now, suggestions are > > very welcome. > > Current schedutil iowait boosting has several issues: > > 1. Boosting happens even in scenarios where it doesn't improve > > throughput. [1] > > 2. The boost is not accounted for in EAS: a) feec() will only consider > > the actual utilization for task placement, but another CPU might be > > more energy-efficient at that capacity than the boosted one.) > > b) When placing a non-IO task while a CPU is boosted compute_energy() > > will not consider the (potentially 'free') boosted capacity, but the > > one it would have without the boost (since the boost is only applied > > in sugov). > > 3. Actual IO heavy workloads are hardly distinguished from infrequent > > in_iowait wakeups. > > 4. The boost isn't associated with a task, it therefore isn't considered > > for task placement, potentially missing out on higher capacity CPUs on > > heterogeneous CPU topologies. > > 5. The boost isn't associated with a task, it therefore lingers on the > > rq even after the responsible task has migrated / stopped. > > 6. The boost isn't associated with a task, it therefore needs to ramp > > up again when migrated. > > 7. Since schedutil doesn't know which task is getting woken up, > > multiple unrelated in_iowait tasks might lead to boosting. You forgot an important problem which what was the main request from Android when this first came up few years back. iowait boost is a power hungry feature and not all tasks require iowait boost. By having it per task we want to be able to prevent tasks from causing frequency spikes due to iowait boost when it is not warranted. > > > > We attempt to mitigate all of the above by reworking the way the > > iowait boosting (io boosting from here on) works in two major ways: > > - Carry the boost in task_struct, so it is a per-task attribute and > > behaves similar to utilization of the task in some ways. > > - Employ a counting-based tracking strategy that only boosts as long > > as it sees benefits and returns to no boosting dynamically. > > Thanks for working on improving IO boosting. I have started to read > your patchset and have few comments about your proposal: > > The main one is that the io boosting decision should remain a cpufreq > governor decision and so the io boosting value should be applied by > the governor like in sugov_effective_cpu_perf() as an example instead > of everywhere in the scheduler code. I have similar thoughts. I think we want the scheduler to treat iowait boost like uclamp_min, but requested by block subsystem rather than by the user. I think we should create a new task_min/max_perf() and replace all current callers in scheduler to uclamp_eff_value() with task_min/max_perf() where task_min/max_perf() unsigned long task_min_perf(struct task_struct *p) { return max(uclamp_eff_value(p, UCLAMP_MIN), p->iowait_boost); } unsigned long task_max_perf(struct task_struct *p) { return uclamp_eff_value(p, UCLAMP_MAX); } then all users of uclamp_min in the scheduler will see the request for boost from iowait and do the correct task placement decision. Including under thermal pressure and ensuring that they don't accidentally escape uclamp_max which I am not sure if your series caters for with the open coding it. You're missing the load balancer paths from what I see. It will also solve the problem I mention above. The tasks that should not use iowait boost are likely restricted with uclamp_max already. If we treat iowait boost as an additional source of min_perf request, then uclamp_max will prevent it from going above a certain perf level and give us the desired impact without any additional hint. I don't think it is important to disable it completely but rather have a way to prevent tasks from consuming too much resources when not needed, which we already have from uclamp_max. I am not sure it makes sense to have a separate control where a task can run fast due to util but can't have iowait boost or vice versa. I think existing uclamp_max should be enough to restrict tasks from exceeding a performance limit. > > Then, the algorithm to track the right interval bucket and the mapping > of intervals into utilization really looks like a policy which has > been defined with heuristics and as a result further seems to be a > governor decision Hmm do you think this should not be a per-task value then Vincent? Or oh, I think I see what you mean. Make effective_cpu_util() set min parameter correctly. I think that would work too, yes. iowait boost is just another min perf request and as long as it is treated as such, it is good for me. We'll just need to add a new parameter for the task like I did in remove uclamp max aggregation serires. Generally I think it's better to split the patches so that the conversion to iowait boost with current algorithm to being per-task as a separate patch. And then look at improving the algorithm logic on top. These are two different problems IMHO. One major problem and big difference in per-task iowait that I see Christian alluded to is that the CPU will no longer be boosted when the task is sleeping. I think there will be cases out there where some users relied on that for the BLOCK softirq to run faster too. We need an additional way to ensure that the softirq runs at a similar performance level to the task that initiated the request. So we need a way to hold the cpufreq policy's min perf until the softirq is serviced. Or just keep the CPU boosted until the task is migrated. I'm not sure what is better yet. > > Finally adding some atomic operation in the fast path is not really desirable Yes I was thinking if we can apply the value when we set the p->in_iowait flag instead?