Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp3238045ybv; Mon, 24 Feb 2020 21:13:47 -0800 (PST) X-Google-Smtp-Source: APXvYqxSrUllVfGd7Kfwily62j4VgI9C8hLyKnqbq7QQ2t5hAb9HGLV9CcUxZASYoKtLkjWy8wkk X-Received: by 2002:a05:6808:1:: with SMTP id u1mr2143568oic.74.1582607627569; Mon, 24 Feb 2020 21:13:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582607627; cv=none; d=google.com; s=arc-20160816; b=Ovx4t/ze7qjQk+szhAIbCrbKi/mdiEHOmdLZa4x5H2SZVcjpWQjxnpAAhkyCK6E0qz wyJBly5I/Mbff9pML5GWEB+Bswv+BgHCiH9qGP24vNY9+mVfTaNKMIDVlQ8tmHpXToEZ BD1FXmssmnNOhhgr4Grmn4rKYOQtAG4LcReDzjvtRb6PzGZksBT6ZKURUM1ETGPLUs0x QNFWX4DYs0HT2AZLs1x+6u5H+i1D9YLs+pCDPeWRNlJCDZO749iFt0MaVFGljNvKadac VAE4VNYiWo+1re+6mi9JNoBxlGMe3XAuUU+KS376uY2f4vuMJPY4vA0V46p7pVlw5U7X QLLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date; bh=B73kAsgHxfZtza656mHGsZibkVXIEQTT1uPjzlXQ6FA=; b=u0yPjafMnhHVXUHIVnUvz9sbmI2TxpxydSG0Wx0fLfRv8A8zssx7FK0pBZqaUPYPzz hefozOailNk3B9XRFZtUBqCNrSKN3txkJLozXEKtI9nP5kTqHhzBsukEco4cRR9ba/Yf i6MDtg0J9I56WOml2Y6aACPnUveKWQr5jVDr1h4cqrhFI2YXR9+4qTUvOOpmwuidulv5 qaM/lU3FCVsJrq+tMN5bUwKGRU+hYMiBwYGLGQe2SY3HLOhr89SRkgiGqjfX2pVdjR7Y xfogkBkPWN3dCjmtBw6NYesSXtk0RaXE8DhVsF4TKyX9/+I18q31ZCXsViGNdgay+nrg BIMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k205si6213947oib.64.2020.02.24.21.13.35; Mon, 24 Feb 2020 21:13:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728956AbgBYFNY (ORCPT + 99 others); Tue, 25 Feb 2020 00:13:24 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:60982 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727009AbgBYFNT (ORCPT ); Tue, 25 Feb 2020 00:13:19 -0500 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01P5CEZo033893; Tue, 25 Feb 2020 00:13:10 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 2yaxt7y46q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Feb 2020 00:13:10 -0500 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 01P5DAgw035881; Tue, 25 Feb 2020 00:13:10 -0500 Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0b-001b2d01.pphosted.com with ESMTP id 2yaxt7y46h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Feb 2020 00:13:10 -0500 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 01P5Bc7u014021; Tue, 25 Feb 2020 05:13:09 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma05wdc.us.ibm.com with ESMTP id 2yaux69acc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 25 Feb 2020 05:13:09 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01P5D8Cs43385310 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 25 Feb 2020 05:13:08 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8B628136051; Tue, 25 Feb 2020 05:13:08 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C31F13604F; Tue, 25 Feb 2020 05:13:08 +0000 (GMT) Received: from sofia.ibm.com (unknown [9.124.35.114]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Tue, 25 Feb 2020 05:13:08 +0000 (GMT) Received: by sofia.ibm.com (Postfix, from userid 1000) id 1E3352E2E07; Tue, 25 Feb 2020 10:43:06 +0530 (IST) Date: Tue, 25 Feb 2020 10:43:06 +0530 From: Gautham R Shenoy To: Pratik Rajesh Sampat Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com, peterz@infradead.org, dsmythies@telus.net, daniel.lezcano@linaro.org, ego@linux.vnet.ibm.com, svaidy@linux.ibm.com, pratik.sampat@in.ibm.com, pratik.r.sampat@gmail.com Subject: Re: [RFC 0/1] Weighted approach to gather and use history in TEO governor Message-ID: <20200225051306.GG12846@in.ibm.com> Reply-To: ego@linux.vnet.ibm.com References: <20200222070002.12897-1-psampat@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200222070002.12897-1-psampat@linux.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-25_01:2020-02-21,2020-02-25 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 mlxscore=0 adultscore=0 phishscore=0 priorityscore=1501 suspectscore=0 impostorscore=0 bulkscore=0 malwarescore=0 spamscore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002250040 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Pratik, On Sat, Feb 22, 2020 at 12:30:01PM +0530, Pratik Rajesh Sampat wrote: > Currently the TEO governor apart from the TEO timer and hit/miss/early > hit buckets; also gathers history of 8 intervals and if there are > significant idle durations less than the current, then it decides if a > shallower state must be chosen. > > The current sliding history window does do a fair job at prediction, > however, the hard-coded window can be a limiting factor for an accurate > prediction and having the window size increase can also linearly affect > both space and time complexity of the prediction. > > To complement the current moving window history, an approach is devised > where each idle state separately maintains a weight for itself and its > counterpart idle states to form a probability distribution. > > When a decision needs to be made, the TEO governor selects an idle state > based on its timer and other hits/early hits metric. After which, the > probability distribution of that selected idle state is looked at which > gives insight into how probable that state is to occur if picked. > > The probability distribution is nothing but a n*n matrix, where > n = drv->state_count. > Each entry in the array signifies a weight for that row. > The weights can vary from the range [0-10000]. > > For example: > state_mat[1][2] = 3000 means that previously when state 1 was selected, > the probability that state 2 will occur is 30%. Could you clarify what this means ? Do you mean that when state 1 is selected, the probability that the CPU will be in state 1 for the duration corresponding to state 2's residency is 30% ? Further more, this means that during idle state selection we have O(n) complexity if n is the number of idle states, since we want to select a state where we are more likely to reside ? > The trailing zeros correspond to having more resolution while increasing > or reducing the weights for correction. > > Currently, for selection of an idle state based on probabilities, a > weighted random number generator is used to choose one of the idle > states. Naturally, the states with higher weights are more likely to be > chosen. > > On wakeup, the weights are updated. The state with which it should have > woken up with (could be the hit / miss / early hit state) is increased > in weight by the "LEARNING_RATE" % and the rest of the states for that > index are reduced by the same factor. So we only update the weight in just one cell ? To use the example above, if we selected state 1, and we resided in it for a duration corresponding to state 2's residency, we will only update state_mat[1][2] ? > > The advantage of this approach is that unlimited history of idle states > can be maintained in constant overhead, which can help in more accurate > prediction for choosing idle states. > > The advantage of unlimited history can become a possible disadvantage as > the lifetime history for that thread may make the weights stale and > influence the choosing of idle states which may not be relevant > anymore. Can the effect of this staleless be observed ? For instance, if we have a particular idle entry/exit pattern for a very long duration, say a few 10s of minutes and then the idle entry/exit pattern changes, how bad will the weighted approach be compared to the current TEO governor ? > Aging the weights could be a solution for that, although this RFC does > not cover the implementation for that. > > Having a finer view of the history in addition to weighted randomized > salt seems to show some promise in terms of saving power without > compromising performance. > > Benchmarks: > Note: Wt. TEO governor represents the governor after the proposed change > > Schbench > ======== > Benchmarks wakeup latencies > Scale of measurement: > 1. 99th percentile latency - usec > 2. Power - Watts > > Command: $ schbench -c 30000 -s 30000 -m 6 -r 30 -t > Varying parameter: -t > > Machine: IBM POWER 9 > > +--------+-------------+-----------------+-----------+-----------------+ > | Threads| TEO latency | Wt. TEO latency | TEO power | Wt. TEO power | > +--------+-------------+-----------------+-----------+-----------------+ > | 2 | 979 | 949 ( +3.06%) | 38 | 36 ( +5.26%) | > | 4 | 997 | 1042 ( -4.51%) | 51 | 39 ( +23.52%) | > | 8 | 1158 | 1050 ( +9.32%) | 89 | 63 ( +29.21%) | > | 16 | 1138 | 1135 ( +0.26%) | 105 | 117 ( -11.42%) | > +--------+-------------+-----------------+-----------+-----------------+ > > Sleeping Ebizzy > =============== > Program to generate workloads resembling web server workloads. > The benchmark is customized to allow for a sleep interval -i > Scale of measurement: > 1. Number of records/s > 2. systime (s) > > Parameters: > 1. -m => Always use mmap instead of malloc > 2. -M => Never use mmap > 3. -S => Number of seconds to run > 4. -i => Sleep interval > > Machine: IBM POWER 9 > > +-------------------+-------------+-------------------+-----------+---------------+ > | Parameters | TEO records | Wt. TEO records | TEO power | Wt. TEO power | > +-------------------+-------------+-------------------+-----------+---------------+ > | -S 60 -i 10000 | 1115000 | 1198081 ( +7.45%) | 149 | 150 ( -0.66%) | > | -m -S 60 -i 10000 | 15879 | 15513 ( -2.30%) | 23 | 22 ( +4.34%) | > | -M -S 60 -i 10000 | 72887 | 77546 ( +6.39%) | 104 | 103 ( +0.96%) | > +-------------------+-------------+-------------------+-----------+---------------+ > > Hackbench > ========= > Creates a specified number of pairs of schedulable entities > which communicate via either sockets or pipes and time how long it > takes for each pair to send data back and forth. > Scale of measurement: > 1. Time (s) > 2. Power (watts) > > Command: Sockets: $ hackbench -l > Pipes : $ hackbench --pipe -l > Varying parameter: -l > > Machine: IBM POWER 9 > > +----------+------------+-------------------+----------+-------------------+ > | Messages | TEO socket | Wt. TEO socket | TEO pipe | Wt. TEO pipe | > +----------+------------+-------------------+----------+-------------------+ > | 100 | 0.042 | 0.043 ( -2.32%) | 0.031 | 0.032 ( +3.12%) | > | 1000 | 0.258 | 0.272 ( +5.14%) | 0.301 | 0.312 ( -3.65%) | > | 10000 | 2.397 | 2.441 ( +1.80%) | 5.642 | 5.092 ( +9.74%) | > | 100000 | 23.691 | 23.730 ( -0.16%) | 57.762 | 57.857 ( -0.16%) | > | 1000000 | 234.103 | 233.841 ( +0.11%) | 559.807 | 592.304 ( -5.80%) | > +----------+------------+-------------------+----------+-------------------+ > > Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO > Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO > > Could you also provide power measurements for the duration when the system is completely idle for each of the variants of TEO governor ? Is it the case that the benefits that we are seeing above are only due to Wt. TEO being more conservative than TEO governor by always choosing a shallower state ? > > Pratik Rajesh Sampat (1): > Weighted approach to gather and use history in TEO governor > > drivers/cpuidle/governors/teo.c | 95 +++++++++++++++++++++++++++++++-- > 1 file changed, 90 insertions(+), 5 deletions(-) > > -- > 2.17.1 > -- Thanks and Regards gautham.