Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964946AbbLOKbh (ORCPT ); Tue, 15 Dec 2015 05:31:37 -0500 Received: from foss.arm.com ([217.140.101.70]:47732 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964803AbbLOKbe (ORCPT ); Tue, 15 Dec 2015 05:31:34 -0500 Date: Tue, 15 Dec 2015 10:31:30 +0000 From: Juri Lelli To: Steve Muckle Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Vincent Guittot , Morten Rasmussen , Dietmar Eggemann , Patrick Bellasi , Michael Turquette , Ricky Liang Subject: Re: [RFCv6 PATCH 03/10] sched: scheduler-driven cpu frequency selection Message-ID: <20151215103130.GF16007@e106622-lin> References: <1449641971-20827-1-git-send-email-smuckle@linaro.org> <1449641971-20827-4-git-send-email-smuckle@linaro.org> <20151211110443.GA6645@e106622-lin> <566F74B3.4020203@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <566F74B3.4020203@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5881 Lines: 146 On 14/12/15 18:02, Steve Muckle wrote: > Hi Juri, > > Thanks for the review. > > On 12/11/2015 03:04 AM, Juri Lelli wrote: > >> +config CPU_FREQ_GOV_SCHED > >> + bool "'sched' cpufreq governor" > >> + depends on CPU_FREQ > > > > We depend on IRQ_WORK as well, which in turn I think depends on SMP. As > > briefly discussed with Peter on IRC, we might want to use > > smp_call_function_single_async() instead to break this dependecies > > chain (and be able to use this governor on UP as well). > > FWIW I don't see an explicit dependency of IRQ_WORK on SMP Oh, right. I seemed to remember that, but now I couldn't find this dependency anymore. > (init/Kconfig), nevertheless I'll take a look at moving to > smp_call_function_single_async() to reduce the dependency list of > sched-freq. > OK, great. I think there's still value in reducing the dependency list. > ... > >> + /* avoid race with cpufreq_sched_stop */ > >> + if (!down_write_trylock(&policy->rwsem)) > >> + return; > >> + > >> + __cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L); > >> + > >> + gd->throttle = ktime_add_ns(ktime_get(), gd->throttle_nsec); > > > > As I think you proposed at Connect, we could use post frequency > > transition notifiers to implement throttling. Is this something that you > > already tried implementing/planning to experiment with? > > I started to do this a while back and then decided to hold off. I think > (though I can't recall for sure) it may have been so I could > artificially throttle the rate of frequency change events further by > specifying an inflated frequency change time. That's useful to have as > we experiment with policy. > > We probably want both of these mechanisms. Throttling at a minimum based > on transition end notifiers, and the option of throttling further for > policy purposes (at least for now, or as a debug option). Will look at > this again. > Yeah, looks good. > ... > >> +static int cpufreq_sched_thread(void *data) > >> +{ > >> + struct sched_param param; > >> + struct cpufreq_policy *policy; > >> + struct gov_data *gd; > >> + unsigned int new_request = 0; > >> + unsigned int last_request = 0; > >> + int ret; > >> + > >> + policy = (struct cpufreq_policy *) data; > >> + gd = policy->governor_data; > >> + > >> + param.sched_priority = 50; > >> + ret = sched_setscheduler_nocheck(gd->task, SCHED_FIFO, ¶m); > >> + if (ret) { > >> + pr_warn("%s: failed to set SCHED_FIFO\n", __func__); > >> + do_exit(-EINVAL); > >> + } else { > >> + pr_debug("%s: kthread (%d) set to SCHED_FIFO\n", > >> + __func__, gd->task->pid); > >> + } > >> + > >> + do { > >> + set_current_state(TASK_INTERRUPTIBLE); > >> + new_request = gd->requested_freq; > >> + if (new_request == last_request) { > >> + schedule(); > >> + } else { > > > > Shouldn't we have to do the following here? > > > > > > @@ -125,9 +125,9 @@ static int cpufreq_sched_thread(void *data) > > } > > > > do { > > - set_current_state(TASK_INTERRUPTIBLE); > > new_request = gd->requested_freq; > > if (new_request == last_request) { > > + set_current_state(TASK_INTERRUPTIBLE); > > schedule(); > > } else { > > /* > > > > Otherwise we set task to INTERRUPTIBLE state right after it has been > > woken up. > > The state must be set to TASK_INTERRUPTIBLE before the data used to > decide whether to sleep or not is read (gd->requested_freq in this case). > > If it is set after, then once gd->requested_freq is read but before the > state is set to TASK_INTERRUPTIBLE, the other side may update > gd->requested_freq and issue a wakeup on the freq thread. The wakeup > will have no effect since the freq thread would still be TASK_RUNNING at > that time. The freq thread would proceed to go to sleep and the update > would be lost. > Mmm, I suggested that because I was hitting this while testing: [ 34.816158] ------------[ cut here ]------------ [ 34.816177] WARNING: CPU: 2 PID: 1712 at kernel/kernel/sched/core.c:7617 __might_sleep+0x90/0xa8() [ 34.816188] do not call blocking ops when !TASK_RUNNING; state=1 set at [] cpufreq_sched_thread+0x80/0x2b0 [ 34.816198] Modules linked in: [ 34.816207] CPU: 2 PID: 1712 Comm: kschedfreq:1 Not tainted 4.4.0-rc2+ #401 [ 34.816212] Hardware name: ARM-Versatile Express [ 34.816229] [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [ 34.816243] [] (show_stack) from [] (dump_stack+0x80/0xb4) [ 34.816257] [] (dump_stack) from [] (warn_slowpath_common+0x88/0xc0) [ 34.816267] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x40/0x48) [ 34.816278] [] (warn_slowpath_fmt) from [] (__might_sleep+0x90/0xa8) [ 34.816291] [] (__might_sleep) from [] (cpufreq_freq_transition_begin+0x6c/0x13c) [ 34.816303] [] (cpufreq_freq_transition_begin) from [] (__cpufreq_driver_target+0x180/0x2c0) [ 34.816314] [] (__cpufreq_driver_target) from [] (cpufreq_sched_try_driver_target+0x48/0x74) [ 34.816324] [] (cpufreq_sched_try_driver_target) from [] (cpufreq_sched_thread+0x70/0x2b0) [ 34.816336] [] (cpufreq_sched_thread) from [] (kthread+0xf4/0x114) [ 34.816347] [] (kthread) from [] (ret_from_fork+0x14/0x24) [ 34.816355] ---[ end trace 30e92db342678467 ]--- Maybe we could cope with what you are saying with an atomic flag indicating that the kthread is currently servicing a request? Like extending the finish_last_request thing to cover this case as well. Best, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/