Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965060Ab3DJJNf (ORCPT ); Wed, 10 Apr 2013 05:13:35 -0400 Received: from mail-bk0-f41.google.com ([209.85.214.41]:49159 "EHLO mail-bk0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750797Ab3DJJNc (ORCPT ); Wed, 10 Apr 2013 05:13:32 -0400 MIME-Version: 1.0 In-Reply-To: <20130410104452.661902af@amdc308.digital.local> References: <1364804657-16590-1-git-send-email-jonghwa3.lee@samsung.com> <20130409123719.7399d5ad@amdc308.digital.local> <20130409184440.4cd87c1b@amdc308.digital.local> <20130410104452.661902af@amdc308.digital.local> Date: Wed, 10 Apr 2013 11:13:30 +0200 Message-ID: Subject: Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor. From: Vincent Guittot To: Lukasz Majewski , Daniel Lezcano , Lorenzo Pieralisi Cc: Viresh Kumar , Jonghwa Lee , "Rafael J. Wysocki" , "linux-kernel@vger.kernel.org" , Linux PM list , "cpufreq@vger.kernel.org" , MyungJoo Ham , Kyungmin Park , Chanwoo Choi , "sw0312.kim@samsung.com" , Marek Szyprowski Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6608 Lines: 195 On 10 April 2013 10:44, Lukasz Majewski wrote: > Hi Vincent, > >> >> >> On Tuesday, 9 April 2013, Lukasz Majewski >> wrote: >> > Hi Viresh and Vincent, >> > >> >> On 9 April 2013 16:07, Lukasz Majewski >> >> wrote: >> >> >> On Mon, Apr 1, 2013 at 1:54 PM, Jonghwa Lee >> >> > Our approach is a bit different than cpufreq_ondemand one. >> >> > Ondemand takes the per CPU idle time, then on that basis >> >> > calculates per cpu load. The next step is to choose the highest >> >> > load and then use this value to properly scale frequency. >> >> > >> >> > On the other hand LAB tries to model different behavior: >> >> > >> >> > As a first step we applied Vincent Guittot's "pack small >> >> > tasks" [*] patch to improve "race to idle" behavior: >> >> > http://article.gmane.org/gmane.linux.kernel/1371435/match=sched+pack+small+tasks >> >> >> >> Luckily he is part of my team :) >> >> >> >> http://www.linaro.org/linux-on-arm/meet-the-team/power-management >> >> >> >> BTW, he is using ondemand governor for all his work. >> >> >> >> > Afterwards, we decided to investigate different approach for >> >> > power governing: >> >> > >> >> > Use the number of sleeping CPUs (not the maximal per-CPU load) to >> >> > change frequency. We thereof depend on [*] to "pack" as many >> >> > tasks to CPU as possible and allow other to sleep. >> >> >> >> He packs only small tasks. >> > >> > What's about packing not only small tasks? I will investigate the >> > possibility to aggressively pack (even with a cost of performance >> > degradation) as many tasks as possible to a single CPU. >> >> Hi Lukasz, >> >> I've got same comment on my current patch and I'm preparing a new >> version that can pack tasks more agressively based on the same buddy >> mecanism. This will be done at the cost of performance of course. > > Can you share your development tree? The dev is not finished yet but i will share it as soon as possible > >> >> >> > >> > It seems a good idea for a power consumption reduction. >> >> In fact, it's not always true and depends several inputs like the >> number of tasks that run simultaneously > > In my understanding, we can try to couple (affine) maximal number of > task with a CPU. Performance shall decrease, but we will avoid costs of > tasks migration. > > If I remember correctly, I've asked you about some testbench/test > program for scheduler evaluation. I assume that nothing has changed and > there isn't any "common" set of scheduler tests? There are a bunch of bench that are used to evaluate scheduler like hackbench, pgbench but they generally fills all CPU in order to test max performance. Are you looking for such kind of bench ? > >> >> > >> >> And if there are many small tasks we are >> >> packing, then load must be high and so ondemand gov will increase >> >> freq. >> > >> > This is of course true for "packing" all tasks to a single CPU. If >> > we stay at the power consumption envelope, we can even overclock the >> > frequency. >> > >> > But what if other - lets say 3 CPUs - are under heavy workload? >> > Ondemand will switch frequency to maximum, and as Jonghwa pointed >> > out this can cause dangerous temperature increase. >> >> IIUC, your main concern is to stay in a power consumption budget to >> not over heat and have to face the side effect of high temperature >> like a decrease of power efficiency. So your governor modifies the >> max frequency based on the number of running/idle CPU > Yes, this is correct. > >> to have an >> almost stable power consumtpion ? > > From our observation it seems, that for 3 or 4 running CPUs under heavy > load we see much more power consumption reduction. That's logic because you will reduce the voltage > > To put it in another way - ondemand would increase frequency to max for > all 4 CPUs. On the other hand, if user experience drops to the > acceptable level we can reduce power consumption. > > Reducing frequency and CPU voltage (by DVS) causes as a side effect, > that temperature stays at acceptable level. > >> >> Have you also looked at the power clamp driver that have similar >> target ? > > I might be wrong here, but in my opinion the power clamp driver is a bit > different: yes, it periodically forces the cluster in a low power state > > 1. It is dedicated to Intel SoCs, which provide special set of > registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to > enter certain C state for a given duration. Idle duration is calculated > by per CPU set of high priority kthreads (which also program [*] > registers). IIRC, a trial on ARM platform have been done by lorenzo and daniel. Lorenzo, Daniel, have you more information ? > > 2. ARM SoCs don't have such infrastructure, so we depend on SW here. > Scheduler has to remove tasks from a particular CPU and "execute" on > it the idle_task. > Moreover at Exynos4 thermal control loop depends on SW, since we can > only read SoC temperature via TMU (Thermal Management Unit) block. The idle duration is quite small and should not perturb normal behavior Vincent > > > Correct me again, but it seems to me that on ARM we can use CPU hotplug > (which as Tomas Glexner stated recently is going to be "refactored" :-) > ) or "ask" scheduler to use smallest possible number of CPUs and enter C > state for idling CPUs. > > > >> >> >> Vincent >> >> > >> >> >> >> > Contrary, when all cores are heavily loaded, we decided to reduce >> >> > frequency by around 30%. With this approach user experience >> >> > recution is still acceptable (with much less power consumption). >> >> >> >> Don't know.. running many cpus at lower freq for long duration will >> >> probably take more power than running them at high freq for short >> >> duration and making system idle again. >> >> >> >> > We have posted this "RFC" patch mainly for discussion, and I >> >> > think it fits its purpose :-). >> >> >> >> Yes, no issues with your RFC idea.. its perfect.. >> >> >> >> @Vincent: Can you please follow this thread a bit and tell us what >> >> your views are? >> >> >> >> -- >> >> viresh >> > >> > >> > >> > -- >> > Best regards, >> > >> > Lukasz Majewski >> > >> > Samsung R&D Poland (SRPOL) | Linux Platform Group >> > > > > -- > Best regards, > > Lukasz Majewski > > Samsung R&D Poland (SRPOL) | Linux Platform Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/