Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751956AbbBKJAl (ORCPT ); Wed, 11 Feb 2015 04:00:41 -0500 Received: from e9.ny.us.ibm.com ([32.97.182.139]:47964 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751166AbbBKJAj (ORCPT ); Wed, 11 Feb 2015 04:00:39 -0500 Message-ID: <54DB1A17.9000706@linux.vnet.ibm.com> Date: Wed, 11 Feb 2015 14:30:07 +0530 From: Preeti U Murthy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Steven Noonan CC: Peter Zijlstra , Thomas Gleixner , jacob.jun.pan@intel.com, Arjan van de Ven , Linux Kernel mailing List , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , frederic@kernel.org, Daniel Lezcano , Amit Kucheria , Eduardo Valentin , Viresh Kumar , rui.zhang@intel.com Subject: Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism References: <20150209044852.6231.66456.stgit@preeti.in.ibm.com> <54D89F46.4030803@linux.vnet.ibm.com> <20150209181436.GA29721@rincewind.us-west-2.compute.internal> In-Reply-To: <20150209181436.GA29721@rincewind.us-west-2.compute.internal> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15021109-0033-0000-0000-000001D1537D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4799 Lines: 82 On 02/09/2015 11:44 PM, Steven Noonan wrote: > On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan wrote: >> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy >> wrote: >>> Hi Steven, >>> >>> On 02/09/2015 01:02 PM, Steven Noonan wrote: >>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy >>>> wrote: >>>>> The powerclamp driver injects idle periods to stay within the thermal constraints. >>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait >>>>> instruction. This behavior of fake idle can confuse the other kernel subsystems. >>>>> For instance it calls into the nohz tick handlers, which are meant to be called >>>>> only by the idle thread. It sets the state of the tick in each cpu to idle and >>>>> stops the tick, when there are tasks on the runqueue. As a result the callers of >>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while the >>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The outcome >>>>> may be inconsistency in the scheduler/nohz states which can lead to serious >>>>> consequences. One of them was reported on this thread: >>>>> https://lkml.org/lkml/2014/12/11/365. >>>>> >>>>> Thomas posted out a patch to disable the powerclamp driver from calling into the >>>>> tick nohz code which has taken care of the above regression for the moment. However >>>>> powerclamp driver as a result, will not be able to inject idle periods due to the >>>>> presence of periodic ticks. With the current design of fake idle, we cannot move >>>>> towards a better solution. >>>>> https://lkml.org/lkml/2014/12/18/169 >>>>> >>>>> This patch aims at removing the concept of fake idle and instead makes the cpus >>>>> truly idle by throttling the runqueues during the idle injection periods. The situation >>>>> is in fact very similar to throttling of cfs_rqs when they exceed their bandwidths. >>>>> The idle injection metrics can be mapped to the bandwidth control metrics 'quota' and >>>>> 'period' to achieve the same result. When the powerclamping is begun or when the >>>>> clamping controls have been modified, the bandwidth for the root task group is set. >>>>> The 'quota' will be the amount of time that the system needs to be busy and 'period' >>>>> will be the sum of this busy duration and the idle duration as calculated by the driver. >>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and clamping mask since >>>>> the thread starting powerclamping will set the bandwidth and throttling of all cpus will >>>>> automatically fall in place. None of the other cpus need be bothered about this. This >>>>> simplifies the design of the driver. >>>>> >>>>> Of course this is only if the idle injection metrics can be conveniently transformed >>>>> into bandwidth control metrics. There are a couple of other primary concerns around if >>>>> doing the below two in this patch is valid. >>>>> a. This patch exports the functions to set the quota and period of task groups. >>>>> b. This patch removes the constraint of not being able to set the root task grp's bandwidth. >>>>> >>>>> Signed-off-by: Preeti U Murthy >>>> >>>> This doesn't compile. >>> >>> Thanks for reporting this! I realized that I had not compiled in the powerclamp driver >>> as a module while compile testing it. I was focusing on the issues with the design and >>> failed to cross verify this. Apologies for the inconvenience. >>> >>> Find the diff compile tested below. >>> >>> I also realized that clamp_cpus() that sets the bandwidth cannot be called from >>> multiple places. Currently I am calling it from end_powerclamp(), when the user changes the >>> idle clamping duration and from a queued timer. This will require synchronization between >>> callers which is not really called for. The queued wakeup_timer alone can re-evaluate the >>> clamping metrics after every throttle-unthrottle period and this should suffice as far >>> as I can see. Thoughts ? >> >> Hmm, I've had two system lockups so far while running a kernel with >> intel_powerclamp loaded. Both times it slowly ground to a halt and >> processes piled up... Hmm I see. I am not surprised because this patch is not complete yet. The idea was to gain suggestions and review around the approach first before I went ahead to making it robust. Let me ease the creases that I found and repost this patch for you to test. Thanks a lot for the testing efforts! :) Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/