Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758014AbZGCThV (ORCPT ); Fri, 3 Jul 2009 15:37:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757132AbZGCThJ (ORCPT ); Fri, 3 Jul 2009 15:37:09 -0400 Received: from bc.sympatico.ca ([209.226.175.184]:57119 "EHLO tomts22-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756475AbZGCThG (ORCPT ); Fri, 3 Jul 2009 15:37:06 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuIEABf5TUpMQWU3/2dsb2JhbACBUcwfhBIF Date: Fri, 3 Jul 2009 15:36:55 -0400 From: Mathieu Desnoyers To: "Pallipadi, Venkatesh" Cc: "linux-kernel@vger.kernel.org" , Dave Jones , Thomas Renninger , "cpufreq@vger.kernel.org" , "kernel-testers@vger.kernel.org" , Ingo Molnar , "rjw@sisk.pl" , Dave Young , Pekka Enberg , "Li, Shaohua" , Rusty Russell , "sven.wegener@stealer.net" Subject: Re: [patch 2.6.30 3/4] cpufreq add gov mutex Message-ID: <20090703193655.GA20266@Krystal> References: <20090703152514.903448283@polymtl.ca> <20090703152714.953585501@polymtl.ca> <7E82351C108FA840AB1866AC776AEC4669BFF0DE@orsmsx505.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <7E82351C108FA840AB1866AC776AEC4669BFF0DE@orsmsx505.amr.corp.intel.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 15:15:06 up 125 days, 15:41, 4 users, load average: 0.36, 0.25, 0.35 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10241 Lines: 294 * Pallipadi, Venkatesh (venkatesh.pallipadi@intel.com) wrote: > > > >-----Original Message----- > >From: Mathieu Desnoyers [mailto:mathieu.desnoyers@polymtl.ca] > >Sent: Friday, July 03, 2009 8:25 AM > >To: linux-kernel@vger.kernel.org; Pallipadi, Venkatesh; Dave > >Jones; Thomas Renninger; cpufreq@vger.kernel.org; > >kernel-testers@vger.kernel.org; Ingo Molnar; rjw@sisk.pl; Dave > >Young; Pekka Enberg > >Cc: Mathieu Desnoyers; Li, Shaohua; Rusty Russell; > >sven.wegener@stealer.net > >Subject: [patch 2.6.30 3/4] cpufreq add gov mutex > > > >Using the cpufreq_gov_mutex to protect internal governor data > >structures. The > >policy rwsem write lock nests inside this mutex. > > This makes the whole locking in cpufreq upside down. Takes away > most of the reason for existence of rwsem lock. Taking a smaller > Percpu rwsem lock inside a bigger global mutex doesn't seem right. > It permits the timer handler to only take the rwsem write lock. My understanding is that the most frequent operation is the timer handler being executed, but maybe I'm wrong ? I think it ends up being a questions of clearly identifying how frequent each data structure access are, and why this per-governor rwsem is needed at all. (does it bring any perceivable performance improvement on a critical path ? Then maybe we should use RCU ?) I really wonder if the fact that you switch the timer handler from rwsem to a mutex in your patchset will not hurt performance in some way or have some unforeseen side-effect. This is why I made my patchset as dumb and simple as possible, because this is a bugfix for 2.6.30, not a reingineering. Let's make it work, then make it fast. > >The policy > >rwsem is taken in > >the timer handler, and therefore cannot be held while doing a > >sync teardown of > >the timer. This cpufreq_gov_mutex lock protects init/teardown > >of governor > >timers. > > Why is the protection required for init/teardown of percpu > timer with a global mutex? cpufreq rwsem (when held correctly) > ensures that there can be only one store_scaling_governor for > a cpu at any point in time. I don't see any reason why we need > a global mutex to protect timer init teardown. > rwsem was previously held in the timer handler. I am not yet convinced that simply holding a local dbs_mutex will ensure proper synchronization of *all* call paths in the timer handler. So given I prefer to do the least intrusive modification, I leave the rwsem write lock in the timer handler, but it leaves no choice but to take a mutex _outside_ of the rwsem lock to synchronize timer init/teardown. > > > This is why this lock, although it protects a data > >structure located > >within the governors, is located in cpufreq.c. > > I think this is taking a step back in terms of locking. A system > wide Cpufreq_gov_mutex is being held more frequently now and seems > to be providing all the serialization needed. The locks crossing > layers of cpufreq core and governor is just going to make > situation worse IMO. Can you prove that my simple bugfix will cause a significant performance regression ? > > If we really want to pursue this option, we should get rid > of rwsem altogether. It is not adding much value > and just providing lot more confusion with things like rwsem > should be held everywhere else other than CPUFREQ_GOV_STOP. > I agree that this rwsem is just odd. But please keep its removal for 2.6.32, and consider using RCU then if performance really matters. And please don't try to overengineer a simple bugfix. The bogus mutex removal you proposed a few weeks ago turned me into the "cautious" mode. And looking at the general code quality of cpufreq (timer teardown races, error path not handled, unbalanced locks, cpu hotplug support broken) makes me vote for the most utterly simple solution. I currently don't care about performance _at all_. I care about something as simple as making sure this thing stop blowing up. Mathieu > > Thanks, > Venki > > > > >Signed-off-by: Mathieu Desnoyers > >CC: Venkatesh Pallipadi > >CC: rjw@sisk.pl > >CC: mingo@elte.hu > >CC: Shaohua Li > >CC: Pekka Enberg > >CC: Dave Young > >CC: "Rafael J. Wysocki" > >CC: Rusty Russell > >CC: sven.wegener@stealer.net > >CC: cpufreq@vger.kernel.org > >CC: Thomas Renninger > >--- > > drivers/cpufreq/cpufreq.c | 38 > >+++++++++++++++++++++++++++++++++++--- > > 1 file changed, 35 insertions(+), 3 deletions(-) > > > >Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c > >=================================================================== > >--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c > >2009-07-03 09:48:35.000000000 -0400 > >+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c 2009-07-03 > >10:12:44.000000000 -0400 > >@@ -133,6 +133,17 @@ pure_initcall(init_cpufreq_transition_no > > static LIST_HEAD(cpufreq_governor_list); > > static DEFINE_MUTEX(cpufreq_governor_mutex); > > > >+/* > >+ * Using the cpufreq_gov_mutex to protect internal governor > >+ * data structures. The policy rwsem write lock nests inside > >this mutex. > >+ * The policy rwsem is taken in the timer handler, and > >therefore cannot be > >+ * held while doing a sync teardown of the timer. > >+ * This cpufreq_gov_mutex lock protects init/teardown of > >governor timers. > >+ * This is why this lock, although it protects a data > >structure located within > >+ * the governors, is here. > >+ */ > >+static DEFINE_MUTEX(cpufreq_gov_mutex); > >+ > > struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu) > > { > > struct cpufreq_policy *data; > >@@ -725,6 +736,7 @@ static ssize_t store(struct kobject *kob > > if (!policy) > > goto no_policy; > > > >+ mutex_lock(&cpufreq_gov_mutex); > > if (lock_policy_rwsem_write(policy->cpu) < 0) > > goto fail; > > > >@@ -735,6 +747,7 @@ static ssize_t store(struct kobject *kob > > > > unlock_policy_rwsem_write(policy->cpu); > > fail: > >+ mutex_unlock(&cpufreq_gov_mutex); > > cpufreq_cpu_put(policy); > > no_policy: > > return ret; > >@@ -823,6 +836,7 @@ static int cpufreq_add_dev(struct sys_de > > > > /* Initially set CPU itself as the policy_cpu */ > > per_cpu(policy_cpu, cpu) = cpu; > >+ mutex_lock(&cpufreq_gov_mutex); > > ret = (lock_policy_rwsem_write(cpu) < 0); > > WARN_ON(ret); > > > >@@ -875,7 +889,7 @@ static int cpufreq_add_dev(struct sys_de > > cpufreq_driver->exit(policy); > > ret = -EBUSY; > > cpufreq_cpu_put(managed_policy); > >- goto err_free_cpumask; > >+ goto err_unlock_gov_mutex; > > } > > > > spin_lock_irqsave(&cpufreq_driver_lock, flags); > >@@ -964,6 +978,7 @@ static int cpufreq_add_dev(struct sys_de > > } > > > > unlock_policy_rwsem_write(cpu); > >+ mutex_unlock(&cpufreq_gov_mutex); > > > > kobject_uevent(&policy->kobj, KOBJ_ADD); > > module_put(cpufreq_driver->owner); > >@@ -989,6 +1004,8 @@ out_driver_exit: > > > > err_unlock_policy: > > unlock_policy_rwsem_write(cpu); > >+err_unlock_gov_mutex: > >+ mutex_unlock(&cpufreq_gov_mutex); > > err_free_cpumask: > > free_cpumask_var(policy->cpus); > > err_free_policy: > >@@ -1028,6 +1045,7 @@ static int __cpufreq_remove_dev(struct s > > spin_unlock_irqrestore(&cpufreq_driver_lock, flags); > > cpufreq_debug_enable_ratelimit(); > > unlock_policy_rwsem_write(cpu); > >+ mutex_unlock(&cpufreq_gov_mutex); > > return -EINVAL; > > } > > per_cpu(cpufreq_cpu_data, cpu) = NULL; > >@@ -1045,6 +1063,7 @@ static int __cpufreq_remove_dev(struct s > > cpufreq_cpu_put(data); > > cpufreq_debug_enable_ratelimit(); > > unlock_policy_rwsem_write(cpu); > >+ mutex_unlock(&cpufreq_gov_mutex); > > return 0; > > } > > #endif > >@@ -1088,9 +1107,13 @@ static int __cpufreq_remove_dev(struct s > > #endif > > > > unlock_policy_rwsem_write(cpu); > >- > >+ /* > >+ * Calling only with the cpufreq_gov_mutex held because > >+ * sync timer teardown has locking dependency with policy rwsem. > >+ */ > > if (cpufreq_driver->target) > > __cpufreq_governor(data, CPUFREQ_GOV_STOP); > >+ mutex_unlock(&cpufreq_gov_mutex); > > > > kobject_put(&data->kobj); > > > >@@ -1123,6 +1146,7 @@ static int cpufreq_remove_dev(struct sys > > if (cpu_is_offline(cpu)) > > return 0; > > > >+ mutex_lock(&cpufreq_gov_mutex); > > if (unlikely(lock_policy_rwsem_write(cpu))) > > BUG(); > > > >@@ -1506,12 +1530,16 @@ int cpufreq_driver_target(struct cpufreq > > if (!policy) > > goto no_policy; > > > >- if (unlikely(lock_policy_rwsem_write(policy->cpu))) > >+ mutex_lock(&cpufreq_gov_mutex); > >+ if (unlikely(lock_policy_rwsem_write(policy->cpu))) { > >+ mutex_unlock(&cpufreq_gov_mutex); > > goto fail; > >+ } > > > > ret = __cpufreq_driver_target(policy, target_freq, relation); > > > > unlock_policy_rwsem_write(policy->cpu); > >+ mutex_unlock(&cpufreq_gov_mutex); > > > > fail: > > cpufreq_cpu_put(policy); > >@@ -1769,7 +1797,9 @@ int cpufreq_update_policy(unsigned int c > > goto no_policy; > > } > > > >+ mutex_lock(&cpufreq_gov_mutex); > > if (unlikely(lock_policy_rwsem_write(cpu))) { > >+ mutex_unlock(&cpufreq_gov_mutex); > > ret = -EINVAL; > > goto fail; > > } > >@@ -1798,6 +1828,7 @@ int cpufreq_update_policy(unsigned int c > > ret = __cpufreq_set_policy(data, &policy); > > > > unlock_policy_rwsem_write(cpu); > >+ mutex_unlock(&cpufreq_gov_mutex); > > > > fail: > > cpufreq_cpu_put(data); > >@@ -1821,6 +1852,7 @@ static int __cpuinit cpufreq_cpu_callbac > > break; > > case CPU_DOWN_PREPARE: > > case CPU_DOWN_PREPARE_FROZEN: > >+ mutex_lock(&cpufreq_gov_mutex); > > if (unlikely(lock_policy_rwsem_write(cpu))) > > BUG(); > > > > > >-- > >Mathieu Desnoyers > >OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 > >A8FE 3BAE 9A68 > > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/