Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2776813lqp; Mon, 25 Mar 2024 08:58:11 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCX4OlKerC3N7jNobbTKDWGcnbLLb6rk1XpCSme0llAezpv0rWVm9a1PobTJI4YfEJ7FZgPqFP/cLPo4IaMXMbVDI2ja2RS/Myr6QxZQLA== X-Google-Smtp-Source: AGHT+IFJRfzVxjlLOyiRySFwdzodSi29VZBSUcCkoi1NoxV0tICkCzOVU1R4Ezuwu2YZCg+nJA1f X-Received: by 2002:a05:6359:5f8f:b0:17e:76a1:4dd3 with SMTP id lh15-20020a0563595f8f00b0017e76a14dd3mr9613646rwc.4.1711382290970; Mon, 25 Mar 2024 08:58:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711382290; cv=pass; d=google.com; s=arc-20160816; b=sIU1nKOj1xx7CibXIK+H7YgKyiUP03GwPzMurH86GOc8tNgJQDH/1qEZSKwTKfBWUB uqqtFtfpgl+YoV8g/ggboGFaEI3dQp87urErHMI8sqD7Q8/yJ2EoE1uFmFJ6E+UEEfY8 H3XEhiiUq2UljCyybpGsvaQIdWknVFATcO6/lZjPYqZTkhXx7GbS06X2tj3aJnyZIoeM gJNXgs9EXx9EccMuH7ixGNr5/AOD3T2xn5Sd31P7ISszufp7TnTUUnEOtU6eiiChnxid /TMy2ASkdgfXmvDxGLnfZZNiXSd+IseHipQqydJwlWktcNYHyQ0WhRwADQkv0Lk6FXXS d3ng== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=IQSoGNn9foRvk9xNgJy3vvfi1aDXWec9jix0W40tVdQ=; fh=6z/NfkTI42xuoWlF1JkQpRFjOTfRXvpC+yjUKh3n+QY=; b=xFSlL3hVbGaVyQpiypuydilvGlcNg7Gwvon92nPBT9uyCKFbmJBFXKf+RRqz/4PIVB aR4uYrjn1Y0HTrSOeRTwlqctVj1u8GAw64qGiAMdfm1rtHDR5TbuiKmi9A5jaX9QwN+a g6mLIEsVGdC9C0FBM+Ad4jjpVXmB3JvEhQaFJmonOCy3irObbnxQgkCXeg5dKbELJWyp yOkl0xz7cdam4GGybEyLvBPVKXaGzfzFTsSom+Tja4CRxjw+L4OcouaPVLliWu32UfBJ 5xhY6cbpQVc7taT3ZwERdEv9IbPJxBstJuYknLoN4e5/MSR+x1q6KluwFj0CGVlPMN4+ psvA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=f+W+83Kz; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-116445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116445-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id i123-20020a636d81000000b005db918c79b2si7647311pgc.326.2024.03.25.08.58.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 08:58:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@layalina-io.20230601.gappssmtp.com header.s=20230601 header.b=f+W+83Kz; arc=pass (i=1 spf=pass spfdomain=layalina.io dkim=pass dkdomain=layalina-io.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-116445-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116445-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 285EEBA25EF for ; Mon, 25 Mar 2024 12:09:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 86D1213BC2D; Mon, 25 Mar 2024 07:31:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b="f+W+83Kz" Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC7001386D3 for ; Mon, 25 Mar 2024 02:53:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711335195; cv=none; b=mdz/q9/MEjjMwuuRoGasjaabW206ZMHK6jZKIjHSXh6DiakVJJ2iVaaRWFFDATNOvNogd4Pc0+Mg86fP0+9iqEhSXG8z4HuhUs11XzE/vWwcl+5B4WjgPIAxcNiqJyjrSbnX6tszKMCd2qVXFbjPI2ka/7BUG8ZmerRA3Usp5a0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711335195; c=relaxed/simple; bh=Iwy35WEKGQaZWaKUprvpCF3gybVRdIbsl3SsAExl21c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Dde5srmqZrumkHoNne1TeRbmVTf8o0UlXhHcMsq0YxXY8Wcx9eYTKwuVvqb9N1ljFKXwiTVy/Y1fjJ7SnMDXlQK+6cjmnJhUv9HqEzVMbM3LHlqOO+mMxemsLnADl5/7Yj89EsUkLyh151NrXcNl/CpFOkWwi4bHBjc3bWukBDc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io; spf=pass smtp.mailfrom=layalina.io; dkim=pass (2048-bit key) header.d=layalina-io.20230601.gappssmtp.com header.i=@layalina-io.20230601.gappssmtp.com header.b=f+W+83Kz; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=layalina.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=layalina.io Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-34175878e30so2770746f8f.3 for ; Sun, 24 Mar 2024 19:53:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=layalina-io.20230601.gappssmtp.com; s=20230601; t=1711335190; x=1711939990; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IQSoGNn9foRvk9xNgJy3vvfi1aDXWec9jix0W40tVdQ=; b=f+W+83KzVcFgOnBIYaWOsLNSTXRgX51IOnmNuCNUJmS/2G7HN6h2q2KVBH81do868A TOmAPiVO1AGVIF62bokjqbMA/KLKbR5PYslQgnzkLoBBrjOuTdHNvSWnQazK4Hu0Yvns Tgwss8zuyfX7MPLtdKvSiwV9nP1+OIHceTc3cb2Zzno9rzb7PiCmTQDEqp8SatcPatLe CtYkuXTESqoTwsnpSystKMCNBRpN0TOalAu965meiWNZ4LQOQYKTuDC298uBvHsaDBbJ AHLPidWzdNIvUmy/EQMM3MW4rgPgcms49zh+u9gUAd6q9bOZ+fD/Unisf8TQb+TUEe1d m3cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711335190; x=1711939990; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IQSoGNn9foRvk9xNgJy3vvfi1aDXWec9jix0W40tVdQ=; b=G91PyOKj0//8lSrkl4z9frPRoi1zJiFdOPUzxrKESvBVYc0v/GgorHRdNXCRGKgZym rn9u/KEBJc9UncTvY85Nx95UpjIBO25mAj1XKB3sqBDU45Ag9hrD3cq7/mpM5gf3c5qi Lzcm7ZCVNZZq3rj9FKht8e1dvUz3+L/rve2FnjjGs8yRecm35GlT4ZBxpy520ic5guRj RxUcEfBO7sTBf9BH4y/OAkz9lG5kKsDyfNLOFxOrl1Rkcfl1e5dzJEOS6YGmlLidD3ma QB9E1H664/slCIq17f2gQi4HqPmbSbTo1Ri+/tEb952ftLdgNQjwyn0803CugD2+bGWM QmBQ== X-Forwarded-Encrypted: i=1; AJvYcCUXYGkjSS0cSJkgQ3JiY9OX2b5G9KPbPDwXOLaiBIQm+RXP4XH1j8UGg4OSMzACO4fbcG9ZRoWdd8/HZtX6e56zjfjuEpOwxOav/kxW X-Gm-Message-State: AOJu0YwgRdomlxxWb3gpHK8TIwM601dk/h0huYPvk+cXep4PoOanLUSP QxoDkk9kO1tbaDm5M5gvjfMWyWhMuWaFkeIWbYs5V+R+jrmp00EepQGuhPHtziU= X-Received: by 2002:a05:6000:d8c:b0:33e:7ae6:6f4a with SMTP id dv12-20020a0560000d8c00b0033e7ae66f4amr4380793wrb.23.1711335190261; Sun, 24 Mar 2024 19:53:10 -0700 (PDT) Received: from airbuntu (host81-157-90-255.range81-157.btcentralplus.com. [81.157.90.255]) by smtp.gmail.com with ESMTPSA id t15-20020a5d690f000000b0033e95bf4796sm8123808wru.27.2024.03.24.19.53.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Mar 2024 19:53:09 -0700 (PDT) Date: Mon, 25 Mar 2024 02:53:08 +0000 From: Qais Yousef To: Christian Loehle Cc: Bart Van Assche , linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, mingo@redhat.com, rafael@kernel.org, dietmar.eggemann@arm.com, vschneid@redhat.com, vincent.guittot@linaro.org, Johannes.Thumshirn@wdc.com, adrian.hunter@intel.com, ulf.hansson@linaro.org, andres@anarazel.de, asml.silence@gmail.com, linux-pm@vger.kernel.org, linux-block@vger.kernel.org, io-uring@vger.kernel.org, linux-mmc@vger.kernel.org Subject: Re: [RFC PATCH 0/2] Introduce per-task io utilization boost Message-ID: <20240325025308.6uqkhpyba6moxntl@airbuntu> References: <20240304201625.100619-1-christian.loehle@arm.com> <86f0af00-8765-4481-9245-1819fb2c6379@acm.org> <0dc6a839-2922-40ac-8854-2884196da9b9@arm.com> <2784c093-eea1-4b73-87da-1a45f14013c8@arm.com> <20240321123935.zqscwi2aom7lfhts@airbuntu> <1ff973fc-66a4-446e-8590-ec655c686c90@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1ff973fc-66a4-446e-8590-ec655c686c90@arm.com> On 03/21/24 17:57, Christian Loehle wrote: > > So you want the hardirq to move to the big core? Unlike softirq, there will be > > a single hardirq for the controller (to my limited knowledge), so if there are > > multiple requests I'm not sure we can easily match which one relates to which > > before it triggers. So we can end up waking up the wrong core. > > It would be beneficial to move the hardirq to a big core if the IO task > is using it anyway. > I'm not sure I actually want to. There are quite a few pitfalls (like you I'm actually against it. I think it's too much complexity for not necessasrily a big gain. FWIW, one of the design request to get per task iowait boost so that we can *disable* it. It wastes power when only a handful of tasks actually care about perf. Caring where the hardirq run for perf is unlikely a problem in practice. Softirq should follow the requester already when it matters. > mentioned) that the scheduler really shouldn't be concerned about. > Moving the hardirq, if implemented in the kernel, would have to be done by the > host controller driver anyway, which would explode this series. > (host controller drivers are quite fragmented e.g. on mmc) > > The fact that having a higher capacity CPU available ("running faster") for an > IO task doesn't (always) imply higher throughput because of the hardirq staying > on some LITTLE CPU is bothering (for this series), though. > > > > > Generally this should be a userspace policy. If there's a scenario where the > > throughput is that important they can easily move the hardirq to the big core > > unconditionally and move it back again once this high throughput scenario is no > > longer important. > > It also feels wrong to let this be a userspace policy, as the hardirq must be > migrated to the perf domain of the task, which userspace isn't aware of. > Unless you expect userspace to do irq balancer is a userspace policy. For kernel to make an automatic decision there are a lot of ifs must be present. Again, I don't see on such system maximizing throughput is a concern. And userspace can fix the problem simply - they know after all when the throughput really matters to the point where the hardirq runs is a bottleneck. In practice, I don't think it is a bottleneck. But this is my handwavy judgement. The experts know better. And note, I mean use cases that are not benchmarks ;-) > CPU_affinity_task=big_perf_domain_0 && hardirq_affinity=big_perf_domain_0 > but then you could just as well ask them to set performance governor for > big_perf_domain_0 (or uclamp_min=1024) and need neither this series nor > any iowait boosting. > > Furthermore you can't generally expect userspace to know if their IO will lead > to any interrupt at all, much less which one. They ideally don't even know if > the file IO they are doing is backed by any physical storage in the first place. > (Or even further, that they are doing file IO at all, they might just be > e.g. page-faulting.) The way I see it, it's like gigabit networking. The hardirq will matter once you reach such high throughput scenarios. Which are corner cases and not the norm? > > > > > Or where you describing a different problem? > > That is the problem I mentioned in the series and Bart and I were discussing. > It's a problem of the series as in "the numbers aren't that impressive". > Current iowait boosting on embedded/mobile systems will perform quite well by > chance, as the (low util) task will often be on the same perf domain the hardirq > will be run on. As can be seen in the cover letter the benefit of running the > task on a (2xLITTLE capacity) big CPU therefore are practically non-existent, > for tri-gear systems where big CPU is more like 10xLITTLE capacity the benefit > will be much greater. > I just wanted to point this out. We might just acknowledge the problem and say > "don't care" about the potential performance benefits of those scenarios that > would require hardirq moving. I thought the softirq does the bulk of the work. hardirq being such a bottleneck is (naively maybe) a red flag for me that it's doing too much than a simple interrupt servicing. You don't boost when the task is sleeping, right? I think this is likely a cause of the problem where softirq is not running as fast - where before the series the CPU will be iowait boosted regardless the task is blocked or not. > In the long-term it looks like for UFS the problem will disappear as we are > expected to get one queue/hardirq per CPU (as Bart mentioned), on NVMe that > is already the case. > > I CC'd Uffe and Adrian for mmc, to my knowledge the only subsystem where > 'fast' (let's say >10K IOPS) devices are common, but only one queue/hardirq > is available (and it doesn't look like this is changing anytime soon). > I would also love to hear what Bart or other UFS folks think about it. > Furthermore if I forgot any storage subsystem with the same behavior in that > regards do tell me. > > Lastly, you could consider the IO workload: > IO task being in iowait very frequently [1] with just a single IO inflight [2] > and only very little time being spent on the CPU in-between iowaits[3], > therefore the interrupt handler being on the critical path for IO throughput > to a non-negligible degree, to be niche, but it's precisely the use-case where > iowait boosting shows it's biggest benefit. > > Sorry for the abomination of a sentence, see footnotes for the reasons. > > [1] If sugov doesn't see significantly more than 1 iowait per TICK_NSEC it > won't apply any significant boost currently. I CCed you to a patch where I fix this. I've been sleeping on it for too long. Maybe I should have split this fix out of the consolidation patch. > [2] If the storage devices has enough in-flight requests to serve, iowait > boosting is unnecessary/wasteful, see cover letter. > [3] If the task actually uses the CPU in-between iowaits, it will build up > utilization, iowait boosting benefit diminishes. The current mechanism is very aggressive. It needs to evolve for sure. > > > > > Glad to see your series by the way :-) I'll get a chance to review it over the > > weekend hopefully. > > Thank you! > Apologies for not CCing you in the first place, I am curious about your opinion > on the concept! I actually had a patch that implements iowait boost per-task (on top of my remove uclamp max aggregation series) where I did actually take the extra step to remove iowait from intel_pstate. Can share the patches if you think you'll find them useful. Just want to note that this mechanism can end up waste power and this is an important direction to consider. It's not about perf only (which matters too). > > FWIW I did mess up a last-minute, what was supposed to be, cosmetic change that > only received a quick smoke test, so 1/2 needs the following: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4aaf64023b03..2b6f521be658 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6824,7 +6824,7 @@ static void dequeue_io_boost(struct cfs_rq *cfs_rq, struct task_struct *p) > } else if (p->io_boost_curr_ios < p->io_boost_threshold_down) { > /* Reduce boost */ > if (p->io_boost_level > 1) > - io_boost_scale_interval(p, true); > + io_boost_scale_interval(p, false); > else > p->io_boost_level = 0; > } else if (p->io_boost_level == IO_BOOST_LEVELS) { > > > I'll probably send a v2 rebased on 6.9 when it's out anyway, but so far the > changes are mostly cosmetic and addressing Bart's comments about the benchmark > numbers in the cover letter. I didn't spend a lot of time on the series, but I can see a number of problems. Let us discuss them first and plan a future direction. No need to v2 if it's just for this fix IMO.