Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756938AbcCCNDT (ORCPT ); Thu, 3 Mar 2016 08:03:19 -0500 Received: from casper.infradead.org ([85.118.1.10]:37853 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751004AbcCCNDR (ORCPT ); Thu, 3 Mar 2016 08:03:17 -0500 Date: Thu, 3 Mar 2016 14:03:09 +0100 From: Peter Zijlstra To: Michael Turquette Cc: Steve Muckle , "Rafael J. Wysocki" , Ingo Molnar , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Vincent Guittot , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Patrick Bellasi , Ricky Liang Subject: Re: [RFCv7 PATCH 03/10] sched: scheduler-driven cpu frequency selection Message-ID: <20160303130309.GO6356@twins.programming.kicks-ass.net> References: <1456190570-4475-1-git-send-email-smuckle@linaro.org> <1456190570-4475-4-git-send-email-smuckle@linaro.org> <8427745.Y8N2bqC3SO@vostro.rjw.lan> <56CF9D8F.7010607@linaro.org> <20160302074910.10178.35029@quark.deferred.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160302074910.10178.35029@quark.deferred.io> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1205 Lines: 29 On Tue, Mar 01, 2016 at 11:49:10PM -0800, Michael Turquette wrote: > > In my over-simplified view of the scheduler, it would be great if we > could have a backdoor mechanism to place the frequency transition > kthread onto a runqueue from within the schedule() context and dispense > with the irq_work stuff in Steve's series altogether. This is actually very very hard :/ So while there is something similar for workqueues, try_to_wake_up_local(), that will not work for the cpufreq stuff. The main problem is that schedule() is done with rq->lock held, but wakeups need p->pi_lock, but it so happens that rq->lock nests inside of p->pi_lock. Now, the workqueue stuff with try_to_wake_up_local() can get away with dropping rq->lock, because of where it is called, way early in schedule() before we really muck things up. The cpufreq hook otoh is called all over the place. The second problem is that doing a wakeup will in fact also end up calling the cpufreq hook, so you're back in recursion hell. The third problem is that cpufreq is called from wakeups, which would want to do another wakeup (see point 2), but this also means we have to nest p->pi_lock, and we can't really do that either.