Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp897730imm; Wed, 10 Oct 2018 06:08:24 -0700 (PDT) X-Google-Smtp-Source: ACcGV61NxmJ3JKlwYGj3bx4Z06SudN3oYK0TJbtlWSsFEaWmRZETxc3lScIDUZo9Iem5M3ChKOCq X-Received: by 2002:a17:902:3fa5:: with SMTP id a34-v6mr33384926pld.244.1539176904407; Wed, 10 Oct 2018 06:08:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539176904; cv=none; d=google.com; s=arc-20160816; b=HeXGMzRJRl1TlDvudfs83v9S0S3EbdKIQWX0c7fq/830i3Xka1BVvM4vHWtxziwTcM bZ+i7yZKTvH+Z193SUKLDevb1/9DuJaXkn+g+zRIwgieiAto0ITjuUyFd0Q5+2vsRK2G mQUR7l92OBfuU59iWhDNFk5w7Zjt9O+7Bz5FM0zqhxqQ8UB4yplAg3w6lMHk8Hz51eQR ZhnsJp2ZrwGmfdvd/GY5SYw/XnN55Kt1E8FTe3oBTEXu2G4WT4HbZ3cDd/K0TNh1PpqS bFqtnBiTyTCbqcOL14WxlCpcFxpZLo7AjGtRyEglENIw57PoAIKPH213ZiXCxMjROEyA b6wA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=p78mltXLZ7sIM8rxqf3ZEOvi8Wi3zzY8kb+m4ksXiv4=; b=ty6YLKHl9f2nJ87WaKGoyLPaCO5wf3iYRKBNMpMuBdiMh+yw4dA0hctEETJ96iTi1t 2KT7GxqW+gwY5V9HSk3Yl9eOOXTNjWb9GQQ0sR3ibmgp/WN7KX55D3jp5rEuRtiLN/e5 VrYVEHfhxlh+Xos4dFzTYhmxr055YY1OSONlJnSqCAWsoSn+4ArA6Mi6ojx98oV/oppy 8arUIaFpnM+Bs62wxl80tLEVojAsRPnUkX7uCdPV6s50/cofUjmV3tmIXTr2aHBLgGeY jaf0GWihMNYv1W3viPs0M4pcMhL30z5VUy/56lyKme0031zgsIm++9sx/ygWJvOR2yHV 61Fg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w8-v6si24311638plz.333.2018.10.10.06.08.09; Wed, 10 Oct 2018 06:08:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726713AbeJJU2C (ORCPT + 99 others); Wed, 10 Oct 2018 16:28:02 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:52158 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726206AbeJJU2C (ORCPT ); Wed, 10 Oct 2018 16:28:02 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0E90CED1; Wed, 10 Oct 2018 06:05:56 -0700 (PDT) Received: from queper01-lin (queper01-lin.cambridge.arm.com [10.1.195.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 629753F5BC; Wed, 10 Oct 2018 06:05:53 -0700 (PDT) Date: Wed, 10 Oct 2018 14:05:51 +0100 From: Quentin Perret To: Vincent Guittot Cc: Ingo Molnar , Thara Gopinath , linux-kernel , Ingo Molnar , Peter Zijlstra , Zhang Rui , "gregkh@linuxfoundation.org" , "Rafael J. Wysocki" , Amit Kachhap , viresh kumar , Javi Merino , Eduardo Valentin , Daniel Lezcano , "open list:THERMAL" , Ionela Voinescu Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure Message-ID: <20181010130549.hzpkaskvlgifbdrp@queper01-lin> References: <1539102302-9057-1-git-send-email-thara.gopinath@linaro.org> <20181010061751.GA37224@gmail.com> <20181010082933.4ful4dzk7rkijcwu@queper01-lin> <20181010095459.orw2gse75klpwosx@queper01-lin> <20181010103623.ttjexasymdpi66lu@queper01-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 10 Oct 2018 at 14:04:40 (+0200), Vincent Guittot wrote: > This patchset doesn't touch cpu_capacity_orig and doesn't need to as > it assume that the max capacity is unchanged but some capacity is > momentary stolen by thermal. > If you want to reflect immediately all thermal capping change, you > have to update this field and all related fields and struct around I don't follow you here. I never said I wanted to change cpu_capacity_orig. I don't think we should do that actually. Changing capacity_of (which is updated during LB IIRC) is just fine. The question is about what you want to do there: reflect an averaged value or the instantaneous one. It's not obvious (to me) that the complex one (the averaged value) is better than the other, simpler, one. All I'm saying from the beginning is that it'd be nice to have numbers and use cases to discuss the pros and cons of both approaches. > > > > Hmm, let me have a closer look at the patches, I must have missed > > > > something ... > > > > > > > > > The pace of changing the capping is to fast to reflect that in the > > > > > whole scheduler topology > > > > > > > > That's probably true in some cases, but it'd be cool to have numbers to > > > > back up that statement, I think. > > > > > > > > Now, if you do need to rebuild the sched domain topology every time you > > > > update the thermal pressure, I think the PELT HL is _way_ too short for > > > > that ... You can't rebuild the whole thing every 32ms or so. Or am I > > > > misunderstanding something ? > > > > > > > > > > Thara, have you tried to experiment with a simpler implementation as > > > > > > suggested by Ingo ? > > > > > > > > > > > > Also, assuming that we do want to average things, do we actually want to > > > > > > tie the thermal ramp-up time to the PELT half life ? That provides > > > > > > nice maths properties wrt the other signals, but it's not obvious to me > > > > > > that this thermal 'constant' should be the same on all platforms. Or > > > > > > maybe it should ? > > > > > > > > > > The main interest of using PELT signal is that thermal pressure will > > > > > evolve at the same pace as other signals used in the scheduler. > > > > > > > > Right, I think this is a nice property too (assuming that we actually > > > > want to average things out). > > > > > > > > > With > > > > > thermal pressure, we have the exact same problem as with RT tasks. The > > > > > thermal will cap the max frequency which will cap the utilization of > > > > > the tasks running on the CPU > > > > > > > > Well, the nature of the signal is slightly different IMO. Yes it's > > > > capacity, but you're no actually measuring time spent on the CPU. All > > > > other PELT signals are based on time, this thermal thing isn't, so it is > > > > kinda different in a way. And I'm still wondering if it could be helpful > > > > > > hmmm ... it is based on time too. > > > > You're not actually measuring the time spent on the CPU by the 'thermal > > task'. There is no such thing as a 'thermal task'. You're just trying to > > model things like that, but the thermal stuff isn't actually > > interrupting running tasks to eat CPU cycles. It just makes thing run > > slower, which isn't exactly the same thing IMO. > > > > But maybe that's a detail. > > > > > Both signals (current ones and thermal one) are really close. The main > > > difference with other utilization signal is that instead of providing > > > a running/not running boolean that is then weighted by the current > > > capacity, the signal uses direclty the capped max capacity to reflect > > > the amount of cycle that is stolen by thermal mitigation. > > > > > > > to be able to have a different HL for that thermal signal. That would > > > > 'break' the nice maths properties we have, yes, but is it a problem or is > > > > it actually helpful to cope with the thermal characteristics of > > > > different platforms ? > > > > > > If you don't use the sign kind of signal with the same responsiveness, > > > you will start to see some OPP toggles as an example when the thermal > > > state change because one metrics will change faster than the other one > > > and you can't have a correct view of the system. Same problem was > > > happening with rt task. > > > > Well, that wasn't the problem with rt tasks. The problem with RT tasks > > was that the time they spend on the CPU wasn't accounted _at all_ when > > selecting frequency for CFS, not that the accounting was at a different > > pace ... > > The problem was the same with RT, the cfs utilization was lower than > reality because RT steals soem cycle to CFS > So schedutil was selecting a lower frequency when cfs was running > whereas the CPU was fully used. > The same can happen with thermal: > cap the max freq because of thermal > the utilization with decrease. > remove the cap > the utilization is still low and you will select a low OPP because you > don't take into account cycle stolen by thermal like with RT I'm not arguing with the fact that we need to reflect the thermal pressure in the scheduler to see that a CPU is fully busy. I agree with that. I'm saying you don't necessarily have to update the thermal stuff and the existing PELT signals *at the same pace*, because different platforms have different thermal characteristics. Thanks,* Quentin