Subject: [RFC 0/5] lock_cpu_hotplug: Redesign - A "lightweight" scalable version.

Hello Everyone,

My previous attempt to redesign cpu_hotplug locking had raised certain issues
like
- Setting right cpufreq's cpu_hotplug_locking.
- Using per-subsystem locks instead of a global lock.
- Scalability of a global lock.
- Unfairness proving a potential drawback.

This patch set attempts to address these issues.

Setting right cpufreq's cpu_hotplug_locking
-------------------------------------------
lock_cpu_hotplug is required in cpufreq subsystem to prevent a hotplug
event from occuring while we are changing frequencies on a group of cpus.
The *ideal* place to have (un)lock_cpu_hotplug is around cpufreq_driver->target
inside __cpufreq_driver_target.

But unfortunately, __cpufreq_driver_target is called from so many different
places,
(http://gautham.shenoy.googlepages.com/cpufreq_driver_target_callers.txt)
including the hot-cpu-callback path, that it becomes impossible to
have lock_cpu_hotplug inside __cpufreq_driver_target.

Which is why all the lock_cpu_hotplug calls had to be moved up the subsystem
to ensure that any code-path which might trigger cpu-frequency change would
take lock_cpu_hotplug.
The patch http://lkml.org/lkml/2006/7/26/140 by Arjan accomplishes this.

However, the problem of lock_cpu_hotplug being called from a hot-cpu-callback
path still remains unsolved. This patch-set attempts to free the callback path
of any calls to lock_cpu_hotplug.

Per-subsystem hotplug locks.
----------------------------
It *is* possible to go for per-subsytem locks as done in case of workqueue.c
using workqueue_mutex, and register the hot-cpu notifier with an appropriate
priority.

I have tried this approach for kernel/sched.c and mm/slab.c.
This seemed to be working fine, except for the false positives raised by
lockdep which was because of holding these mutexes between
cpu_chain_mutex lock/unlock.

My big concern about this is the possible interdependency of subsystems
now to get locking order correct, which can be an ugly thing to solve by
itself at the compile time. Although I havent been able to find a concrete
example for such a dependency (except of course the famous
cpufreq_ondemand-workqueue one), implicitly relying on the fact that such a
possibility will not arise is, IMHO, "not the right thing".

Which is why I feel lock_cpu_hotplug deserves one more chance (though
it seriously needs a better name to suit it's refcount implementation).

Scalability of the global lock.
------------------------------------------
The proposed locking schema is *extremely* lightweight in the reader-fast path.
This is achieved by using a per-cpu refcount which will be bumped up/down
depending on the read_lock or read_unlock.

In the reader-slow path, which is extremely rare, the lock becomes a unfair
rw-sem i.e writers assume control *only* when there are no readers in the
system.

The writer, on arrival would just set the writer_flag and do a
synchronize_sched() to allow all the pending readers to finish and prompt the
new readers to take the slowpath.Paul McKenney suggested this approach.

If there are readers in the system, the writer will sleep and will be
woken up by the last *reader*.

Unfairness-proving to be a problem.
------------------------------------
I could not find any system calls that would allow the users to exploit
unfairness. Hoping that I haven't missed out anything important, I have
decided to retain the unfair nature of lock_cpu_hotplug.

The patch set is as follows.
[patch 1/5]: cpufreq subsystem cleanup to fix coding style issues and other
trivial issues. Hope this improves the readability of this code.

[patch 2/5]: Eliminate lock_cpu_hotplug in cpufreq's hotcpu callback path.

[patch 3/5]: Use (un)lock_cpu_hotplug instead of workqueue_mutex.
This fixes the lockdep warnings, for holding workqueue_mutex
between cpu_chain lock/unlock!

[patch 4/5]: Proposed new design for cpu_hotplug lock.

[patch 5/5]: Add lockdep support to the proposed cpu_hotplug lock.

These patches are against linux-2.6.19-rc2-mm2.

Looking forward to your feedback.

Thanks and Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"


Subject: [PATCH 1/5] lock_cpu_hotplug:Redesign - Fix coding style issues in cpufreq.

Clean up cpufreq subsystem to fix coding style issues and to improve
the readability.

Signed-off-by: Gautham R Shenoy <[email protected]>
---
drivers/cpufreq/cpufreq.c | 136 ++++++++++++++++++++++-----------
drivers/cpufreq/cpufreq_conservative.c | 23 +++--
drivers/cpufreq/cpufreq_ondemand.c | 12 +-
drivers/cpufreq/cpufreq_performance.c | 9 +-
drivers/cpufreq/cpufreq_powersave.c | 9 +-
drivers/cpufreq/cpufreq_stats.c | 11 +-
drivers/cpufreq/freq_table.c | 28 ++++--
7 files changed, 150 insertions(+), 78 deletions(-)

Index: hotplug/drivers/cpufreq/cpufreq.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq.c
+++ hotplug/drivers/cpufreq/cpufreq.c
@@ -29,7 +29,8 @@
#include <linux/completion.h>
#include <linux/mutex.h>

-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, "cpufreq-core", msg)
+#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, \
+ "cpufreq-core", msg)

/**
* The "cpufreq driver" - the arch- or hardware-dependent low
@@ -41,7 +42,8 @@ static struct cpufreq_policy *cpufreq_cp
static DEFINE_SPINLOCK(cpufreq_driver_lock);

/* internal prototypes */
-static int __cpufreq_governor(struct cpufreq_policy *policy, unsigned int event);
+static int __cpufreq_governor(struct cpufreq_policy *policy,
+ unsigned int event);
static void handle_update(void *data);

/**
@@ -151,7 +153,8 @@ static void cpufreq_debug_disable_rateli
spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
}

-void cpufreq_debug_printk(unsigned int type, const char *prefix, const char *fmt, ...)
+void cpufreq_debug_printk(unsigned int type, const char *prefix,
+ const char *fmt, ...)
{
char s[256];
va_list args;
@@ -161,7 +164,8 @@ void cpufreq_debug_printk(unsigned int t
WARN_ON(!prefix);
if (type & debug) {
spin_lock_irqsave(&disable_ratelimit_lock, flags);
- if (!disable_ratelimit && debug_ratelimit && !printk_ratelimit()) {
+ if (!disable_ratelimit && debug_ratelimit
+ && !printk_ratelimit()) {
spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
return;
}
@@ -182,10 +186,12 @@ EXPORT_SYMBOL(cpufreq_debug_printk);


module_param(debug, uint, 0644);
-MODULE_PARM_DESC(debug, "CPUfreq debugging: add 1 to debug core, 2 to debug drivers, and 4 to debug governors.");
+MODULE_PARM_DESC(debug, "CPUfreq debugging: add 1 to debug core,"
+ " 2 to debug drivers, and 4 to debug governors.");

module_param(debug_ratelimit, uint, 0644);
-MODULE_PARM_DESC(debug_ratelimit, "CPUfreq debugging: set to 0 to disable ratelimiting.");
+MODULE_PARM_DESC(debug_ratelimit, "CPUfreq debugging:"
+ " set to 0 to disable ratelimiting.");

#else /* !CONFIG_CPU_FREQ_DEBUG */

@@ -219,17 +225,23 @@ static void adjust_jiffies(unsigned long
if (!l_p_j_ref_freq) {
l_p_j_ref = loops_per_jiffy;
l_p_j_ref_freq = ci->old;
- dprintk("saving %lu as reference value for loops_per_jiffy; freq is %u kHz\n", l_p_j_ref, l_p_j_ref_freq);
+ dprintk("saving %lu as reference value for loops_per_jiffy;"
+ "freq is %u kHz\n", l_p_j_ref, l_p_j_ref_freq);
}
if ((val == CPUFREQ_PRECHANGE && ci->old < ci->new) ||
(val == CPUFREQ_POSTCHANGE && ci->old > ci->new) ||
(val == CPUFREQ_RESUMECHANGE || val == CPUFREQ_SUSPENDCHANGE)) {
- loops_per_jiffy = cpufreq_scale(l_p_j_ref, l_p_j_ref_freq, ci->new);
- dprintk("scaling loops_per_jiffy to %lu for frequency %u kHz\n", loops_per_jiffy, ci->new);
+ loops_per_jiffy = cpufreq_scale(l_p_j_ref, l_p_j_ref_freq,
+ ci->new);
+ dprintk("scaling loops_per_jiffy to %lu"
+ "for frequency %u kHz\n", loops_per_jiffy, ci->new);
}
}
#else
-static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) { return; }
+static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
+{
+ return;
+}
#endif


@@ -316,7 +328,8 @@ static int cpufreq_parse_governor (char
if (!strnicmp(str_governor, "performance", CPUFREQ_NAME_LEN)) {
*policy = CPUFREQ_POLICY_PERFORMANCE;
err = 0;
- } else if (!strnicmp(str_governor, "powersave", CPUFREQ_NAME_LEN)) {
+ } else if (!strnicmp(str_governor, "powersave",
+ CPUFREQ_NAME_LEN)) {
*policy = CPUFREQ_POLICY_POWERSAVE;
err = 0;
}
@@ -328,7 +341,8 @@ static int cpufreq_parse_governor (char
t = __find_governor(str_governor);

if (t == NULL) {
- char *name = kasprintf(GFP_KERNEL, "cpufreq_%s", str_governor);
+ char *name = kasprintf(GFP_KERNEL, "cpufreq_%s",
+ str_governor);

if (name) {
int ret;
@@ -361,7 +375,8 @@ extern struct sysdev_class cpu_sysdev_cl


/**
- * cpufreq_per_cpu_attr_read() / show_##file_name() - print out cpufreq information
+ * cpufreq_per_cpu_attr_read() / show_##file_name() -
+ * print out cpufreq information
*
* Write out information from cpufreq_driver->policy[cpu]; object must be
* "unsigned int".
@@ -380,7 +395,8 @@ show_one(scaling_min_freq, min);
show_one(scaling_max_freq, max);
show_one(scaling_cur_freq, cur);

-static int __cpufreq_set_policy(struct cpufreq_policy *data, struct cpufreq_policy *policy);
+static int __cpufreq_set_policy(struct cpufreq_policy *data,
+ struct cpufreq_policy *policy);

/**
* cpufreq_per_cpu_attr_write() / store_##file_name() - sysfs write access
@@ -416,7 +432,8 @@ store_one(scaling_max_freq,max);
/**
* show_cpuinfo_cur_freq - current CPU frequency as detected by hardware
*/
-static ssize_t show_cpuinfo_cur_freq (struct cpufreq_policy * policy, char *buf)
+static ssize_t show_cpuinfo_cur_freq (struct cpufreq_policy * policy,
+ char *buf)
{
unsigned int cur_freq = cpufreq_get(policy->cpu);
if (!cur_freq)
@@ -428,7 +445,8 @@ static ssize_t show_cpuinfo_cur_freq (st
/**
* show_scaling_governor - show the current policy for the specified CPU
*/
-static ssize_t show_scaling_governor (struct cpufreq_policy * policy, char *buf)
+static ssize_t show_scaling_governor (struct cpufreq_policy * policy,
+ char *buf)
{
if(policy->policy == CPUFREQ_POLICY_POWERSAVE)
return sprintf(buf, "powersave\n");
@@ -458,7 +476,8 @@ static ssize_t store_scaling_governor (s
if (ret != 1)
return -EINVAL;

- if (cpufreq_parse_governor(str_governor, &new_policy.policy, &new_policy.governor))
+ if (cpufreq_parse_governor(str_governor, &new_policy.policy,
+ &new_policy.governor))
return -EINVAL;

lock_cpu_hotplug();
@@ -474,7 +493,10 @@ static ssize_t store_scaling_governor (s

unlock_cpu_hotplug();

- return ret ? ret : count;
+ if (ret)
+ return ret;
+ else
+ return count;
}

/**
@@ -488,7 +510,7 @@ static ssize_t show_scaling_driver (stru
/**
* show_scaling_available_governors - show the available CPUfreq governors
*/
-static ssize_t show_scaling_available_governors (struct cpufreq_policy * policy,
+static ssize_t show_scaling_available_governors (struct cpufreq_policy *policy,
char *buf)
{
ssize_t i = 0;
@@ -574,7 +596,11 @@ static ssize_t show(struct kobject * kob
policy = cpufreq_cpu_get(policy->cpu);
if (!policy)
return -EINVAL;
- ret = fattr->show ? fattr->show(policy,buf) : -EIO;
+ if (fattr->show)
+ ret = fattr->show(policy, buf);
+ else
+ ret = -EIO;
+
cpufreq_cpu_put(policy);
return ret;
}
@@ -588,7 +614,11 @@ static ssize_t store(struct kobject * ko
policy = cpufreq_cpu_get(policy->cpu);
if (!policy)
return -EINVAL;
- ret = fattr->store ? fattr->store(policy,buf,count) : -EIO;
+ if (fattr->store)
+ ret = fattr->store(policy, buf, count);
+ else
+ ret = -EIO;
+
cpufreq_cpu_put(policy);
return ret;
}
@@ -911,7 +941,8 @@ static void handle_update(void *data)
* We adjust to current frequency first, and need to clean up later. So either call
* to cpufreq_update_policy() or schedule handle_update()).
*/
-static void cpufreq_out_of_sync(unsigned int cpu, unsigned int old_freq, unsigned int new_freq)
+static void cpufreq_out_of_sync(unsigned int cpu, unsigned int old_freq,
+ unsigned int new_freq)
{
struct cpufreq_freqs freqs;

@@ -936,16 +967,16 @@ static void cpufreq_out_of_sync(unsigned
unsigned int cpufreq_quick_get(unsigned int cpu)
{
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
- unsigned int ret = 0;
+ unsigned int ret_freq = 0;

if (policy) {
mutex_lock(&policy->lock);
- ret = policy->cur;
+ ret_freq = policy->cur;
mutex_unlock(&policy->lock);
cpufreq_cpu_put(policy);
}

- return (ret);
+ return (ret_freq);
}
EXPORT_SYMBOL(cpufreq_quick_get);

@@ -959,7 +990,7 @@ EXPORT_SYMBOL(cpufreq_quick_get);
unsigned int cpufreq_get(unsigned int cpu)
{
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
- unsigned int ret = 0;
+ unsigned int ret_freq = 0;

if (!policy)
return 0;
@@ -969,12 +1000,14 @@ unsigned int cpufreq_get(unsigned int cp

mutex_lock(&policy->lock);

- ret = cpufreq_driver->get(cpu);
+ ret_freq = cpufreq_driver->get(cpu);

- if (ret && policy->cur && !(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
- /* verify no discrepancy between actual and saved value exists */
- if (unlikely(ret != policy->cur)) {
- cpufreq_out_of_sync(cpu, policy->cur, ret);
+ if (ret_freq && policy->cur &&
+ !(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
+ /* verify no discrepancy between actual and
+ saved value exists */
+ if (unlikely(ret_freq != policy->cur)) {
+ cpufreq_out_of_sync(cpu, policy->cur, ret_freq);
schedule_work(&policy->update);
}
}
@@ -984,7 +1017,7 @@ unsigned int cpufreq_get(unsigned int cp
out:
cpufreq_cpu_put(policy);

- return (ret);
+ return (ret_freq);
}
EXPORT_SYMBOL(cpufreq_get);

@@ -996,7 +1029,7 @@ EXPORT_SYMBOL(cpufreq_get);
static int cpufreq_suspend(struct sys_device * sysdev, pm_message_t pmsg)
{
int cpu = sysdev->id;
- unsigned int ret = 0;
+ int ret = 0;
unsigned int cur_freq = 0;
struct cpufreq_policy *cpu_policy;

@@ -1078,7 +1111,7 @@ out:
static int cpufreq_resume(struct sys_device * sysdev)
{
int cpu = sysdev->id;
- unsigned int ret = 0;
+ int ret = 0;
struct cpufreq_policy *cpu_policy;

dprintk("resuming cpu %u\n", cpu);
@@ -1299,17 +1332,20 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_getavg)
* when "event" is CPUFREQ_GOV_LIMITS
*/

-static int __cpufreq_governor(struct cpufreq_policy *policy, unsigned int event)
+static int __cpufreq_governor(struct cpufreq_policy *policy,
+ unsigned int event)
{
int ret;

if (!try_module_get(policy->governor->owner))
return -EINVAL;

- dprintk("__cpufreq_governor for CPU %u, event %u\n", policy->cpu, event);
+ dprintk("__cpufreq_governor for CPU %u, event %u\n",
+ policy->cpu, event);
ret = policy->governor->governor(policy, event);

- /* we keep one module reference alive for each CPU governed by this CPU */
+ /* we keep one module reference alive for
+ each CPU governed by this CPU */
if ((event != CPUFREQ_GOV_START) || ret)
module_put(policy->governor->owner);
if ((event == CPUFREQ_GOV_STOP) && !ret)
@@ -1385,9 +1421,12 @@ EXPORT_SYMBOL(cpufreq_get_policy);


/*
+ * data : current policy.
+ * policy : policy to be set.
* Locking: Must be called with the lock_cpu_hotplug() lock held
*/
-static int __cpufreq_set_policy(struct cpufreq_policy *data, struct cpufreq_policy *policy)
+static int __cpufreq_set_policy(struct cpufreq_policy *data,
+ struct cpufreq_policy *policy)
{
int ret = 0;

@@ -1395,7 +1434,8 @@ static int __cpufreq_set_policy(struct c
dprintk("setting new policy for CPU %u: %u - %u kHz\n", policy->cpu,
policy->min, policy->max);

- memcpy(&policy->cpuinfo, &data->cpuinfo, sizeof(struct cpufreq_cpuinfo));
+ memcpy(&policy->cpuinfo, &data->cpuinfo,
+ sizeof(struct cpufreq_cpuinfo));

if (policy->min > data->min && policy->min > policy->max) {
ret = -EINVAL;
@@ -1428,7 +1468,8 @@ static int __cpufreq_set_policy(struct c
data->min = policy->min;
data->max = policy->max;

- dprintk("new min and max freqs are %u - %u kHz\n", data->min, data->max);
+ dprintk("new min and max freqs are %u - %u kHz\n",
+ data->min, data->max);

if (cpufreq_driver->setpolicy) {
data->policy = policy->policy;
@@ -1449,10 +1490,12 @@ static int __cpufreq_set_policy(struct c
data->governor = policy->governor;
if (__cpufreq_governor(data, CPUFREQ_GOV_START)) {
/* new governor failed, so re-start old one */
- dprintk("starting governor %s failed\n", data->governor->name);
+ dprintk("starting governor %s failed\n",
+ data->governor->name);
if (old_gov) {
data->governor = old_gov;
- __cpufreq_governor(data, CPUFREQ_GOV_START);
+ __cpufreq_governor(data,
+ CPUFREQ_GOV_START);
}
ret = -EINVAL;
goto error_out;
@@ -1542,7 +1585,8 @@ int cpufreq_update_policy(unsigned int c
data->cur = policy.cur;
} else {
if (data->cur != policy.cur)
- cpufreq_out_of_sync(cpu, data->cur, policy.cur);
+ cpufreq_out_of_sync(cpu, data->cur,
+ policy.cur);
}
}

@@ -1646,8 +1690,10 @@ int cpufreq_register_driver(struct cpufr

/* if all ->init() calls failed, unregister */
if (ret) {
- dprintk("no CPU initialized for driver %s\n", driver_data->name);
- sysdev_driver_unregister(&cpu_sysdev_class, &cpufreq_sysdev_driver);
+ dprintk("no CPU initialized for driver %s\n",
+ driver_data->name);
+ sysdev_driver_unregister(&cpu_sysdev_class,
+ &cpufreq_sysdev_driver);

spin_lock_irqsave(&cpufreq_driver_lock, flags);
cpufreq_driver = NULL;
Index: hotplug/drivers/cpufreq/cpufreq_performance.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_performance.c
+++ hotplug/drivers/cpufreq/cpufreq_performance.c
@@ -15,7 +15,8 @@
#include <linux/cpufreq.h>
#include <linux/init.h>

-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "performance", msg)
+#define dprintk(msg...) \
+ cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "performance", msg)


static int cpufreq_governor_performance(struct cpufreq_policy *policy,
@@ -24,8 +25,10 @@ static int cpufreq_governor_performance(
switch (event) {
case CPUFREQ_GOV_START:
case CPUFREQ_GOV_LIMITS:
- dprintk("setting to %u kHz because of event %u\n", policy->max, event);
- __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H);
+ dprintk("setting to %u kHz because of event %u\n",
+ policy->max, event);
+ __cpufreq_driver_target(policy, policy->max,
+ CPUFREQ_RELATION_H);
break;
default:
break;
Index: hotplug/drivers/cpufreq/cpufreq_powersave.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_powersave.c
+++ hotplug/drivers/cpufreq/cpufreq_powersave.c
@@ -15,7 +15,8 @@
#include <linux/cpufreq.h>
#include <linux/init.h>

-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "powersave", msg)
+#define dprintk(msg...) \
+ cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "powersave", msg)

static int cpufreq_governor_powersave(struct cpufreq_policy *policy,
unsigned int event)
@@ -23,8 +24,10 @@ static int cpufreq_governor_powersave(st
switch (event) {
case CPUFREQ_GOV_START:
case CPUFREQ_GOV_LIMITS:
- dprintk("setting to %u kHz because of event %u\n", policy->min, event);
- __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L);
+ dprintk("setting to %u kHz because of event %u\n",
+ policy->min, event);
+ __cpufreq_driver_target(policy, policy->min,
+ CPUFREQ_RELATION_L);
break;
default:
break;
Index: hotplug/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ hotplug/drivers/cpufreq/cpufreq_ondemand.c
@@ -41,8 +41,10 @@
static unsigned int def_sampling_rate;
#define MIN_SAMPLING_RATE_RATIO (2)
/* for correct statistics, we need at least 10 ticks between each measure */
-#define MIN_STAT_SAMPLING_RATE (MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10))
-#define MIN_SAMPLING_RATE (def_sampling_rate / MIN_SAMPLING_RATE_RATIO)
+#define MIN_STAT_SAMPLING_RATE \
+ (MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10))
+#define MIN_SAMPLING_RATE \
+ (def_sampling_rate / MIN_SAMPLING_RATE_RATIO)
#define MAX_SAMPLING_RATE (500 * def_sampling_rate)
#define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER (1000)
#define TRANSITION_LATENCY_LIMIT (10 * 1000)
@@ -202,7 +204,8 @@ static ssize_t store_sampling_rate(struc
ret = sscanf(buf, "%u", &input);

mutex_lock(&dbs_mutex);
- if (ret != 1 || input > MAX_SAMPLING_RATE || input < MIN_SAMPLING_RATE) {
+ if (ret != 1 || input > MAX_SAMPLING_RATE
+ || input < MIN_SAMPLING_RATE) {
mutex_unlock(&dbs_mutex);
return -EINVAL;
}
@@ -496,7 +499,8 @@ static int cpufreq_governor_dbs(struct c
if (dbs_enable == 1) {
kondemand_wq = create_workqueue("kondemand");
if (!kondemand_wq) {
- printk(KERN_ERR "Creation of kondemand failed\n");
+ printk(KERN_ERR
+ "Creation of kondemand failed\n");
dbs_enable--;
mutex_unlock(&dbs_mutex);
return -ENOSPC;
Index: hotplug/drivers/cpufreq/cpufreq_conservative.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_conservative.c
+++ hotplug/drivers/cpufreq/cpufreq_conservative.c
@@ -44,15 +44,17 @@
* latency of the processor. The governor will work on any processor with
* transition latency <= 10mS, using appropriate sampling
* rate.
- * For CPUs with transition latency > 10mS (mostly drivers with CPUFREQ_ETERNAL)
- * this governor will not work.
+ * For CPUs with transition latency > 10mS (mostly drivers
+ * with CPUFREQ_ETERNAL), this governor will not work.
* All times here are in uS.
*/
static unsigned int def_sampling_rate;
#define MIN_SAMPLING_RATE_RATIO (2)
/* for correct statistics, we need at least 10 ticks between each measure */
-#define MIN_STAT_SAMPLING_RATE (MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10))
-#define MIN_SAMPLING_RATE (def_sampling_rate / MIN_SAMPLING_RATE_RATIO)
+#define MIN_STAT_SAMPLING_RATE \
+ (MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10))
+#define MIN_SAMPLING_RATE \
+ (def_sampling_rate / MIN_SAMPLING_RATE_RATIO)
#define MAX_SAMPLING_RATE (500 * def_sampling_rate)
#define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER (1000)
#define DEF_SAMPLING_DOWN_FACTOR (1)
@@ -103,11 +105,16 @@ static struct dbs_tuners dbs_tuners_ins

static inline unsigned int get_cpu_idle_time(unsigned int cpu)
{
- return kstat_cpu(cpu).cpustat.idle +
+ unsigned int add_nice = 0, ret;
+
+ if (dbs_tuners_ins.ignore_nice)
+ add_nice = kstat_cpu(cpu).cpustat.nice;
+
+ ret = kstat_cpu(cpu).cpustat.idle +
kstat_cpu(cpu).cpustat.iowait +
- ( dbs_tuners_ins.ignore_nice ?
- kstat_cpu(cpu).cpustat.nice :
- 0);
+ add_nice;
+
+ return ret;
}

/************************** sysfs interface ************************/
Index: hotplug/drivers/cpufreq/cpufreq_stats.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_stats.c
+++ hotplug/drivers/cpufreq/cpufreq_stats.c
@@ -351,8 +351,8 @@ __init cpufreq_stats_init(void)

register_hotcpu_notifier(&cpufreq_stat_cpu_notifier);
for_each_online_cpu(cpu) {
- cpufreq_stat_cpu_callback(&cpufreq_stat_cpu_notifier, CPU_ONLINE,
- (void *)(long)cpu);
+ cpufreq_stat_cpu_callback(&cpufreq_stat_cpu_notifier,
+ CPU_ONLINE, (void *)(long)cpu);
}
return 0;
}
@@ -368,14 +368,15 @@ __exit cpufreq_stats_exit(void)
unregister_hotcpu_notifier(&cpufreq_stat_cpu_notifier);
lock_cpu_hotplug();
for_each_online_cpu(cpu) {
- cpufreq_stat_cpu_callback(&cpufreq_stat_cpu_notifier, CPU_DEAD,
- (void *)(long)cpu);
+ cpufreq_stat_cpu_callback(&cpufreq_stat_cpu_notifier,
+ CPU_DEAD, (void *)(long)cpu);
}
unlock_cpu_hotplug();
}

MODULE_AUTHOR ("Zou Nan hai <[email protected]>");
-MODULE_DESCRIPTION ("'cpufreq_stats' - A driver to export cpufreq stats through sysfs filesystem");
+MODULE_DESCRIPTION ("'cpufreq_stats' - A driver to export cpufreq stats"
+ "through sysfs filesystem");
MODULE_LICENSE ("GPL");

module_init(cpufreq_stats_init);
Index: hotplug/drivers/cpufreq/freq_table.c
===================================================================
--- hotplug.orig/drivers/cpufreq/freq_table.c
+++ hotplug/drivers/cpufreq/freq_table.c
@@ -9,7 +9,8 @@
#include <linux/init.h>
#include <linux/cpufreq.h>

-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, "freq-table", msg)
+#define dprintk(msg...) \
+ cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, "freq-table", msg)

/*********************************************************************
* FREQUENCY TABLE HELPERS *
@@ -29,7 +30,8 @@ int cpufreq_frequency_table_cpuinfo(stru

continue;
}
- dprintk("table entry %u: %u kHz, %u index\n", i, freq, table[i].index);
+ dprintk("table entry %u: %u kHz, %u index\n",
+ i, freq, table[i].index);
if (freq < min_freq)
min_freq = freq;
if (freq > max_freq)
@@ -54,13 +56,14 @@ int cpufreq_frequency_table_verify(struc
unsigned int i;
unsigned int count = 0;

- dprintk("request for verification of policy (%u - %u kHz) for cpu %u\n", policy->min, policy->max, policy->cpu);
+ dprintk("request for verification of policy (%u - %u kHz) for cpu %u\n",
+ policy->min, policy->max, policy->cpu);

if (!cpu_online(policy->cpu))
return -EINVAL;

- cpufreq_verify_within_limits(policy,
- policy->cpuinfo.min_freq, policy->cpuinfo.max_freq);
+ cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);

for (i=0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
unsigned int freq = table[i].frequency;
@@ -75,10 +78,11 @@ int cpufreq_frequency_table_verify(struc
if (!count)
policy->max = next_larger;

- cpufreq_verify_within_limits(policy,
- policy->cpuinfo.min_freq, policy->cpuinfo.max_freq);
+ cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);

- dprintk("verification lead to (%u - %u kHz) for cpu %u\n", policy->min, policy->max, policy->cpu);
+ dprintk("verification lead to (%u - %u kHz) for cpu %u\n",
+ policy->min, policy->max, policy->cpu);

return 0;
}
@@ -101,7 +105,8 @@ int cpufreq_frequency_table_target(struc
};
unsigned int i;

- dprintk("request for target %u kHz (relation: %u) for cpu %u\n", target_freq, relation, policy->cpu);
+ dprintk("request for target %u kHz (relation: %u) for cpu %u\n",
+ target_freq, relation, policy->cpu);

switch (relation) {
case CPUFREQ_RELATION_H:
@@ -192,7 +197,10 @@ static ssize_t show_available_freqs (str
}

struct freq_attr cpufreq_freq_attr_scaling_available_freqs = {
- .attr = { .name = "scaling_available_frequencies", .mode = 0444, .owner=THIS_MODULE },
+ .attr = { .name = "scaling_available_frequencies",
+ .mode = 0444,
+ .owner=THIS_MODULE
+ },
.show = show_available_freqs,
};
EXPORT_SYMBOL_GPL(cpufreq_freq_attr_scaling_available_freqs);
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Subject: [PATCH 2/5] lock_cpu_hotplug:Redesign - remove lock_cpu_hotplug from hot-cpu_callback path in cpufreq.

Ensure that the hot-cpu-callback path in cpufreq is free from any
(un)lock_cpu_hotplug calls.

Signed-off-by : Gautham R Shenoy <[email protected]>
---
drivers/cpufreq/cpufreq.c | 83 +++++++++++++++++++++++++++-------------
drivers/cpufreq/cpufreq_stats.c | 4 +
include/linux/cpufreq.h | 1
3 files changed, 61 insertions(+), 27 deletions(-)

Index: hotplug/drivers/cpufreq/cpufreq.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq.c
+++ hotplug/drivers/cpufreq/cpufreq.c
@@ -642,6 +642,7 @@ static struct kobj_type ktype_cpufreq =
};


+int _cpufreq_set_policy(struct cpufreq_policy *);
/**
* cpufreq_add_dev - add a CPU device
*
@@ -776,11 +777,11 @@ static int cpufreq_add_dev (struct sys_d
}

policy->governor = NULL; /* to assure that the starting sequence is
- * run in cpufreq_set_policy */
+ * run in _cpufreq_set_policy */
mutex_unlock(&policy->lock);

/* set default policy */
- ret = cpufreq_set_policy(&new_policy);
+ ret = _cpufreq_set_policy(&new_policy);
if (ret) {
dprintk("setting policy failed\n");
goto err_out_unregister;
@@ -1284,7 +1285,9 @@ int __cpufreq_driver_target(struct cpufr
}
EXPORT_SYMBOL_GPL(__cpufreq_driver_target);

-int cpufreq_driver_target(struct cpufreq_policy *policy,
+/*Locking: Must be called with lock_cpu_hotplug held, except from the
+hotcpu callback path */
+int _cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
{
@@ -1294,17 +1297,27 @@ int cpufreq_driver_target(struct cpufreq
if (!policy)
return -EINVAL;

- lock_cpu_hotplug();
mutex_lock(&policy->lock);

ret = __cpufreq_driver_target(policy, target_freq, relation);

mutex_unlock(&policy->lock);
- unlock_cpu_hotplug();

cpufreq_cpu_put(policy);
return ret;
}
+int cpufreq_driver_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ int ret;
+
+ lock_cpu_hotplug();
+ ret = _cpufreq_driver_target(policy, target_freq, relation);
+ unlock_cpu_hotplug();
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(cpufreq_driver_target);

int cpufreq_driver_getavg(struct cpufreq_policy *policy)
@@ -1511,13 +1524,8 @@ error_out:
return ret;
}

-/**
- * cpufreq_set_policy - set a new CPUFreq policy
- * @policy: policy to be set.
- *
- * Sets a new CPU frequency and voltage scaling policy.
- */
-int cpufreq_set_policy(struct cpufreq_policy *policy)
+/* Locking: To be called with lock_cpu_hotplug held. */
+int _cpufreq_set_policy(struct cpufreq_policy *policy)
{
int ret = 0;
struct cpufreq_policy *data;
@@ -1529,8 +1537,6 @@ int cpufreq_set_policy(struct cpufreq_po
if (!data)
return -EINVAL;

- lock_cpu_hotplug();
-
/* lock this CPU */
mutex_lock(&data->lock);

@@ -1541,23 +1547,32 @@ int cpufreq_set_policy(struct cpufreq_po
data->user_policy.governor = data->governor;

mutex_unlock(&data->lock);
-
- unlock_cpu_hotplug();
cpufreq_cpu_put(data);

return ret;
}
-EXPORT_SYMBOL(cpufreq_set_policy);
-

/**
- * cpufreq_update_policy - re-evaluate an existing cpufreq policy
- * @cpu: CPU which shall be re-evaluated
+ * cpufreq_set_policy - set a new CPUFreq policy
+ * @policy: policy to be set.
*
- * Usefull for policy notifiers which have different necessities
- * at different times.
+ * Sets a new CPU frequency and voltage scaling policy.
*/
-int cpufreq_update_policy(unsigned int cpu)
+int cpufreq_set_policy(struct cpufreq_policy *policy)
+{
+ int ret;
+
+ lock_cpu_hotplug();
+ ret = _cpufreq_set_policy(policy);
+ unlock_cpu_hotplug();
+
+ return ret;
+}
+EXPORT_SYMBOL(cpufreq_set_policy);
+
+
+/* Locking: to be called with lock_cpu_hotplug held. */
+int __cpufreq_update_policy(unsigned int cpu)
{
struct cpufreq_policy *data = cpufreq_cpu_get(cpu);
struct cpufreq_policy policy;
@@ -1566,7 +1581,6 @@ int cpufreq_update_policy(unsigned int c
if (!data)
return -ENODEV;

- lock_cpu_hotplug();
mutex_lock(&data->lock);

dprintk("updating policy for CPU %u\n", cpu);
@@ -1593,10 +1607,27 @@ int cpufreq_update_policy(unsigned int c
ret = __cpufreq_set_policy(data, &policy);

mutex_unlock(&data->lock);
- unlock_cpu_hotplug();
cpufreq_cpu_put(data);
return ret;
}
+
+/**
+ * cpufreq_update_policy - re-evaluate an existing cpufreq policy
+ * @cpu: CPU which shall be re-evaluated
+ *
+ * Usefull for policy notifiers which have different necessities
+ * at different times.
+ */
+int cpufreq_update_policy(unsigned int cpu)
+{
+ int ret;
+
+ lock_cpu_hotplug();
+ ret = __cpufreq_update_policy(cpu);
+ unlock_cpu_hotplug();
+
+ return ret;
+}
EXPORT_SYMBOL(cpufreq_update_policy);

#ifdef CONFIG_HOTPLUG_CPU
@@ -1623,7 +1654,7 @@ static int cpufreq_cpu_callback(struct n
*/
policy = cpufreq_cpu_data[cpu];
if (policy) {
- cpufreq_driver_target(policy, policy->min,
+ _cpufreq_driver_target(policy, policy->min,
CPUFREQ_RELATION_H);
}
break;
Index: hotplug/drivers/cpufreq/cpufreq_stats.c
===================================================================
--- hotplug.orig/drivers/cpufreq/cpufreq_stats.c
+++ hotplug/drivers/cpufreq/cpufreq_stats.c
@@ -309,7 +309,7 @@ static int cpufreq_stat_cpu_callback(str

switch (action) {
case CPU_ONLINE:
- cpufreq_update_policy(cpu);
+ __cpufreq_update_policy(cpu);
break;
case CPU_DEAD:
cpufreq_stats_free_table(cpu);
@@ -350,10 +350,12 @@ __init cpufreq_stats_init(void)
}

register_hotcpu_notifier(&cpufreq_stat_cpu_notifier);
+ lock_cpu_hotplug();
for_each_online_cpu(cpu) {
cpufreq_stat_cpu_callback(&cpufreq_stat_cpu_notifier,
CPU_ONLINE, (void *)(long)cpu);
}
+ unlock_cpu_hotplug();
return 0;
}
static void
Index: hotplug/include/linux/cpufreq.h
===================================================================
--- hotplug.orig/include/linux/cpufreq.h
+++ hotplug/include/linux/cpufreq.h
@@ -258,6 +258,7 @@ struct freq_attr {
int cpufreq_set_policy(struct cpufreq_policy *policy);
int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
int cpufreq_update_policy(unsigned int cpu);
+int __cpufreq_update_policy(unsigned int cpu);

/* query the current CPU frequency (in kHz). If zero, cpufreq couldn't detect it */
unsigned int cpufreq_get(unsigned int cpu);
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Subject: [PATCH 3/5] lock_cpu_hotplug:Redesign - Use lock_cpu_hotplug in workqueue.c instead of workqueue_mutex.

Use lock_cpu_hotplug() instead of workqueue_mutex to prevent a
cpu-hotplug event in kernel/workqueue.c.

Signed-off-by: Gautham R Shenoy <[email protected]>

---
kernel/workqueue.c | 37 +++++++++++++------------------------
1 files changed, 13 insertions(+), 24 deletions(-)

Index: hotplug/kernel/workqueue.c
===================================================================
--- hotplug.orig/kernel/workqueue.c
+++ hotplug/kernel/workqueue.c
@@ -67,9 +67,7 @@ struct workqueue_struct {
struct list_head list; /* Empty if single thread */
};

-/* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
- threads to each one as cpus come/go. */
-static DEFINE_MUTEX(workqueue_mutex);
+static DEFINE_SPINLOCK(workqueue_lock);
static LIST_HEAD(workqueues);

static int singlethread_cpu;
@@ -328,10 +326,10 @@ void fastcall flush_workqueue(struct wor
} else {
int cpu;

- mutex_lock(&workqueue_mutex);
+ lock_cpu_hotplug();
for_each_online_cpu(cpu)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
- mutex_unlock(&workqueue_mutex);
+ unlock_cpu_hotplug();
}
}
EXPORT_SYMBOL_GPL(flush_workqueue);
@@ -379,7 +377,7 @@ struct workqueue_struct *__create_workqu
}

wq->name = name;
- mutex_lock(&workqueue_mutex);
+ lock_cpu_hotplug();
if (singlethread) {
INIT_LIST_HEAD(&wq->list);
p = create_workqueue_thread(wq, singlethread_cpu);
@@ -388,7 +386,9 @@ struct workqueue_struct *__create_workqu
else
wake_up_process(p);
} else {
+ spin_lock(&workqueue_lock);
list_add(&wq->list, &workqueues);
+ spin_unlock(&workqueue_lock);
for_each_online_cpu(cpu) {
p = create_workqueue_thread(wq, cpu);
if (p) {
@@ -398,8 +398,7 @@ struct workqueue_struct *__create_workqu
destroy = 1;
}
}
- mutex_unlock(&workqueue_mutex);
-
+ unlock_cpu_hotplug();
/*
* Was there any error during startup? If yes then clean up:
*/
@@ -439,15 +438,17 @@ void destroy_workqueue(struct workqueue_
flush_workqueue(wq);

/* We don't need the distraction of CPUs appearing and vanishing. */
- mutex_lock(&workqueue_mutex);
+ lock_cpu_hotplug();
if (is_single_threaded(wq))
cleanup_workqueue_thread(wq, singlethread_cpu);
else {
for_each_online_cpu(cpu)
cleanup_workqueue_thread(wq, cpu);
+ spin_lock(&workqueue_lock);
list_del(&wq->list);
+ spin_unlock(&workqueue_lock);
}
- mutex_unlock(&workqueue_mutex);
+ unlock_cpu_hotplug();
free_percpu(wq->cpu_wq);
kfree(wq);
}
@@ -519,13 +520,13 @@ int schedule_on_each_cpu(void (*func)(vo
if (!works)
return -ENOMEM;

- mutex_lock(&workqueue_mutex);
+ lock_cpu_hotplug();
for_each_online_cpu(cpu) {
INIT_WORK(per_cpu_ptr(works, cpu), func, info);
__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu),
per_cpu_ptr(works, cpu));
}
- mutex_unlock(&workqueue_mutex);
+ unlock_cpu_hotplug();
flush_workqueue(keventd_wq);
free_percpu(works);
return 0;
@@ -641,7 +642,6 @@ static int __devinit workqueue_cpu_callb

switch (action) {
case CPU_UP_PREPARE:
- mutex_lock(&workqueue_mutex);
/* Create a new workqueue thread for it. */
list_for_each_entry(wq, &workqueues, list) {
if (!create_workqueue_thread(wq, hotcpu)) {
@@ -660,7 +660,6 @@ static int __devinit workqueue_cpu_callb
kthread_bind(cwq->thread, hotcpu);
wake_up_process(cwq->thread);
}
- mutex_unlock(&workqueue_mutex);
break;

case CPU_UP_CANCELED:
@@ -672,15 +671,6 @@ static int __devinit workqueue_cpu_callb
any_online_cpu(cpu_online_map));
cleanup_workqueue_thread(wq, hotcpu);
}
- mutex_unlock(&workqueue_mutex);
- break;
-
- case CPU_DOWN_PREPARE:
- mutex_lock(&workqueue_mutex);
- break;
-
- case CPU_DOWN_FAILED:
- mutex_unlock(&workqueue_mutex);
break;

case CPU_DEAD:
@@ -688,7 +678,6 @@ static int __devinit workqueue_cpu_callb
cleanup_workqueue_thread(wq, hotcpu);
list_for_each_entry(wq, &workqueues, list)
take_over_work(wq, hotcpu);
- mutex_unlock(&workqueue_mutex);
break;
}

--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Subject: [PATCH 4/5] lock_cpu_hotplug: Redesign - Lightweight implementation of lock_cpu_hotplug.

(Refcount + Workqueue) implementation for cpu_hotplug "locking".
It is analogous to a unfair rwsem. But it is *extremely* lightweight in
the reader-fast-path.

Based on valuable inputs by Paul E McKenney, Srivatsa Vaddagiri, Dipankar Sarma
and Ingo Molnar.

Signed-off-by : Gautham R Shenoy <[email protected]>

---
include/linux/cpu.h | 2
init/main.c | 1
kernel/cpu.c | 231 ++++++++++++++++++++++++++++++++++++++++++----------
3 files changed, 194 insertions(+), 40 deletions(-)

Index: hotplug/init/main.c
===================================================================
--- hotplug.orig/init/main.c
+++ hotplug/init/main.c
@@ -573,6 +573,7 @@ asmlinkage void __init start_kernel(void
vfs_caches_init_early();
cpuset_init_early();
mem_init();
+ cpu_hotplug_init();
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
Index: hotplug/include/linux/cpu.h
===================================================================
--- hotplug.orig/include/linux/cpu.h
+++ hotplug/include/linux/cpu.h
@@ -65,6 +65,7 @@ static inline void unregister_cpu_notifi
extern struct sysdev_class cpu_sysdev_class;

#ifdef CONFIG_HOTPLUG_CPU
+extern void cpu_hotplug_init(void);
/* Stop CPUs going up and down. */
extern void lock_cpu_hotplug(void);
extern void unlock_cpu_hotplug(void);
@@ -78,6 +79,7 @@ extern void unlock_cpu_hotplug(void);
int cpu_down(unsigned int cpu);
#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
#else
+#define cpu_hotplug_init() do { } while (0)
#define lock_cpu_hotplug() do { } while (0)
#define unlock_cpu_hotplug() do { } while (0)
#define lock_cpu_hotplug_interruptible() 0
Index: hotplug/kernel/cpu.c
===================================================================
--- hotplug.orig/kernel/cpu.c
+++ hotplug/kernel/cpu.c
@@ -1,6 +1,10 @@
/* CPU control.
* (C) 2001, 2002, 2003, 2004 Rusty Russell
*
+ * Hotplug - Locking
+ * 2006: Implemented by Gautham R Shenoy with the aid of some valuable inputs
+ * from Paul E McKenney, Srivatsa Vaddagiri, Dipankar Sarma and Ingo Molnar.
+ *
* This code is licenced under the GPL.
*/
#include <linux/proc_fs.h>
@@ -14,10 +18,11 @@
#include <linux/kthread.h>
#include <linux/stop_machine.h>
#include <linux/mutex.h>
-
-/* This protects CPUs going up and down... */
-static DEFINE_MUTEX(cpu_add_remove_lock);
-static DEFINE_MUTEX(cpu_bitmask_lock);
+#include <asm/percpu.h>
+#include <linux/cpumask.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+#include <linux/rcupdate.h>

static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);

@@ -26,52 +31,202 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(c
*/
static int cpu_hotplug_disabled;

+/************************************************************************
+ * A FEW CONTEXT SPECIFIC DEFINITIONS *
+ * ---------------------------------------------------------------------*
+ * - reader : task which tries to *prevent* a cpu hotplug event. *
+ * - writer : task which tries to *perform* a cpu hotplug event. *
+ * - write-operation: cpu hotplug operation. *
+ * *
+ ************************************************************************/
+
+/************************************************************************
+ * THE PROTOCOL *
+ *----------------------------------------------------------------------*
+ *- Analogous to RWSEM, only not so fair *
+ *- Readers assume control iff: *
+ * a) No other reader has a reference and no writer is writing. *
+ * OR *
+ * b) Atleast one reader (on *any* cpu) has a reference. *
+ *- Writer assumes control iff: *
+ * there are no active readers and there are no active writers. *
+ *- Writer, on completion would preferable wake up other waiting. *
+ * writers over the waiting readers. *
+ *- The *last* writer wakes up all the waiting readers. *
+ ************************************************************************/
+
+static struct {
+ unsigned int status; /* Read mostly global */
+ /* The following variables are only for the slowpath */
+ spinlock_t lock;
+ wait_queue_head_t read_queue;
+ wait_queue_head_t write_queue;
+} cpu_hotplug;
+
+DEFINE_PER_CPU(int, refcount) = {0};
+
+ /* cpu_hotplug.status can be one of the following */
+#define NO_WRITERS 0x00000000 /* Obvious from name */
+#define WRITER_WAITING 0x00000001 /* Writer present, but is waiting */
+#define WRITER_ACTIVE 0x00000002 /* Writer is performing write */
+
+#define writer_exists() cpu_hotplug.status
+
+/* Returns the number of readers in the system */
+static inline int nr_readers(void)
+{
+ int count=0, i;
+
+ for_each_possible_cpu(i)
+ count += per_cpu(refcount, i);
+
+ return count;
+}
+
#ifdef CONFIG_HOTPLUG_CPU
+void __init cpu_hotplug_init(void)
+{
+ cpu_hotplug.status = NO_WRITERS;
+ spin_lock_init(&cpu_hotplug.lock);
+ init_waitqueue_head(&cpu_hotplug.read_queue);
+ init_waitqueue_head(&cpu_hotplug.write_queue);
+}

-/* Crappy recursive lock-takers in cpufreq! Complain loudly about idiots */
-static struct task_struct *recursive;
-static int recursive_depth;
+static void slow_path_reader_lock(void);
+static void slow_path_reader_unlock(void);

+/**********************************************************************
+ MAIN CPU_HOTPLUG (LOCK/ UNLOCK/ BEGIN/ DONE) CODE
+ *********************************************************************/
+
+/* Blocks iff write operation is on-going
+* OR
+* A writer is waiting and there are no other readers.
+*/
void lock_cpu_hotplug(void)
{
- struct task_struct *tsk = current;
-
- if (tsk == recursive) {
- static int warnings = 10;
- if (warnings) {
- printk(KERN_ERR "Lukewarm IQ detected in hotplug locking\n");
- WARN_ON(1);
- warnings--;
- }
- recursive_depth++;
+ preempt_disable();
+ if (likely(!writer_exists())) {
+ per_cpu(refcount, smp_processor_id())++;
+ preempt_enable();
return;
}
- mutex_lock(&cpu_bitmask_lock);
- recursive = tsk;
+ preempt_enable();
+ slow_path_reader_lock();
}
EXPORT_SYMBOL_GPL(lock_cpu_hotplug);

void unlock_cpu_hotplug(void)
{
- WARN_ON(recursive != current);
- if (recursive_depth) {
- recursive_depth--;
+ preempt_disable();
+ if (likely(!writer_exists())) {
+ per_cpu(refcount, smp_processor_id())--;
+ preempt_enable();
return;
}
- mutex_unlock(&cpu_bitmask_lock);
- recursive = NULL;
+ preempt_enable();
+ slow_path_reader_unlock();
}
EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);

+#endif /* CONFIG_HOTPLUG_CPU */
+
+static void cpu_hotplug_begin(void)
+{
+ DECLARE_WAITQUEUE(wait, current);
+
+ spin_lock(&cpu_hotplug.lock);
+ if (likely(cpu_hotplug.status == NO_WRITERS)) {
+ cpu_hotplug.status = WRITER_WAITING;
+ spin_unlock(&cpu_hotplug.lock);
+
+ /* Allow new readers to see this change in status and
+ * notify them to take the slowpath.
+ *
+ * Also allow the older readers who have not seen the status
+ * change to bump up/down their percpu refcount.
+ */
+ synchronize_sched();
+
+ spin_lock(&cpu_hotplug.lock);
+ if (!nr_readers()) {
+ cpu_hotplug.status = WRITER_ACTIVE;
+ spin_unlock(&cpu_hotplug.lock);
+ return;
+ }
+ }
+
+ add_wait_queue_exclusive(&cpu_hotplug.write_queue, &wait);
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock(&cpu_hotplug.lock);
+ schedule();
+ remove_wait_queue(&cpu_hotplug.write_queue, &wait);
+}
+
+static void cpu_hotplug_done(void)
+{
+ spin_lock(&cpu_hotplug.lock);
+
+ if (!list_empty(&cpu_hotplug.write_queue.task_list))
+ wake_up(&cpu_hotplug.write_queue);
+ else {
+ cpu_hotplug.status = NO_WRITERS;
+ if (!list_empty(&cpu_hotplug.read_queue.task_list))
+ wake_up_all(&cpu_hotplug.read_queue);
+ }
+
+ spin_unlock(&cpu_hotplug.lock);
+}
+
+/**********************************************************************
+ READER SLOWPATH CODE.
+ **********************************************************************/
+#ifdef CONFIG_HOTPLUG_CPU
+static void slow_path_reader_lock(void)
+{
+ DECLARE_WAITQUEUE(wait, current);
+
+ spin_lock(&cpu_hotplug.lock);
+
+ while (writer_exists()) {
+ /* This check makes the whole business unfair */
+ if (cpu_hotplug.status == WRITER_WAITING && nr_readers())
+ goto out;
+
+ add_wait_queue(&cpu_hotplug.read_queue, &wait);
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock(&cpu_hotplug.lock);
+ schedule();
+ remove_wait_queue(&cpu_hotplug.read_queue, &wait);
+ spin_lock(&cpu_hotplug.lock);
+ }
+out:
+ per_cpu(refcount, smp_processor_id())++;
+ spin_unlock(&cpu_hotplug.lock);
+
+}
+
+static void slow_path_reader_unlock(void)
+{
+ spin_lock(&cpu_hotplug.lock);
+ per_cpu(refcount, smp_processor_id())--;
+ if (!nr_readers() &&
+ !list_empty(&cpu_hotplug.write_queue.task_list)) {
+ cpu_hotplug.status = WRITER_ACTIVE;
+ wake_up(&cpu_hotplug.write_queue);
+ }
+ spin_unlock(&cpu_hotplug.lock);
+}
+
#endif /* CONFIG_HOTPLUG_CPU */

/* Need to know about CPUs going up/down? */
int __cpuinit register_cpu_notifier(struct notifier_block *nb)
{
int ret;
- mutex_lock(&cpu_add_remove_lock);
+ lock_cpu_hotplug();
ret = raw_notifier_chain_register(&cpu_chain, nb);
- mutex_unlock(&cpu_add_remove_lock);
+ unlock_cpu_hotplug();
return ret;
}

@@ -81,9 +236,9 @@ EXPORT_SYMBOL(register_cpu_notifier);

void unregister_cpu_notifier(struct notifier_block *nb)
{
- mutex_lock(&cpu_add_remove_lock);
+ lock_cpu_hotplug();
raw_notifier_chain_unregister(&cpu_chain, nb);
- mutex_unlock(&cpu_add_remove_lock);
+ unlock_cpu_hotplug();
}
EXPORT_SYMBOL(unregister_cpu_notifier);

@@ -146,9 +301,7 @@ static int _cpu_down(unsigned int cpu)
cpu_clear(cpu, tmp);
set_cpus_allowed(current, tmp);

- mutex_lock(&cpu_bitmask_lock);
p = __stop_machine_run(take_cpu_down, NULL, cpu);
- mutex_unlock(&cpu_bitmask_lock);

if (IS_ERR(p)) {
/* CPU didn't die: tell everyone. Can't complain. */
@@ -192,13 +345,13 @@ int cpu_down(unsigned int cpu)
{
int err = 0;

- mutex_lock(&cpu_add_remove_lock);
+ cpu_hotplug_begin();
if (cpu_hotplug_disabled)
err = -EBUSY;
else
err = _cpu_down(cpu);

- mutex_unlock(&cpu_add_remove_lock);
+ cpu_hotplug_done();
return err;
}
#endif /*CONFIG_HOTPLUG_CPU*/
@@ -221,9 +374,7 @@ static int __devinit _cpu_up(unsigned in
}

/* Arch-specific enabling code. */
- mutex_lock(&cpu_bitmask_lock);
ret = __cpu_up(cpu);
- mutex_unlock(&cpu_bitmask_lock);
if (ret != 0)
goto out_notify;
BUG_ON(!cpu_online(cpu));
@@ -243,13 +394,13 @@ int __devinit cpu_up(unsigned int cpu)
{
int err = 0;

- mutex_lock(&cpu_add_remove_lock);
+ cpu_hotplug_begin();
if (cpu_hotplug_disabled)
err = -EBUSY;
else
err = _cpu_up(cpu);

- mutex_unlock(&cpu_add_remove_lock);
+ cpu_hotplug_done();
return err;
}

@@ -260,7 +411,7 @@ int disable_nonboot_cpus(void)
{
int cpu, first_cpu, error;

- mutex_lock(&cpu_add_remove_lock);
+ cpu_hotplug_begin();
first_cpu = first_cpu(cpu_present_map);
if (!cpu_online(first_cpu)) {
error = _cpu_up(first_cpu);
@@ -301,7 +452,7 @@ int disable_nonboot_cpus(void)
printk(KERN_ERR "Non-boot CPUs are not disabled");
}
out:
- mutex_unlock(&cpu_add_remove_lock);
+ cpu_hotplug_done();
return error;
}

@@ -310,9 +461,9 @@ void enable_nonboot_cpus(void)
int cpu, error;

/* Allow everyone to use the CPU hotplug again */
- mutex_lock(&cpu_add_remove_lock);
+ lock_cpu_hotplug();
cpu_hotplug_disabled = 0;
- mutex_unlock(&cpu_add_remove_lock);
+ unlock_cpu_hotplug();

printk("Enabling non-boot CPUs ...\n");
for_each_cpu_mask(cpu, frozen_cpus) {
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

Subject: [PATCH 5/5] lock_cpu_hotplug: Redesign - Lockdep support for lightweight lock_cpu_hotplug.

Add lockdep support to (refcount+waitqueue) implementation of
lock_cpu_hotplug.

Signed-off-by: Gautham R Shenoy <[email protected]>

---
kernel/cpu.c | 20 ++++++++++++++++++++
1 files changed, 20 insertions(+)

Index: hotplug/kernel/cpu.c
===================================================================
--- hotplug.orig/kernel/cpu.c
+++ hotplug/kernel/cpu.c
@@ -23,6 +23,7 @@
#include <linux/types.h>
#include <linux/wait.h>
#include <linux/rcupdate.h>
+#include <linux/lockdep.h>

static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);

@@ -61,6 +62,9 @@ static struct {
spinlock_t lock;
wait_queue_head_t read_queue;
wait_queue_head_t write_queue;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
} cpu_hotplug;

DEFINE_PER_CPU(int, refcount) = {0};
@@ -83,6 +87,16 @@ static inline int nr_readers(void)
return count;
}

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static inline void cpu_hotplug_lockdep_init(void)
+{
+ static struct lock_class_key __key;
+ lockdep_init_map(&cpu_hotplug.dep_map, "cpu-hotplug-lock", &__key, 0);
+}
+#else
+ #define cpu_hotplug_lockdep_init() do { } while (0);
+#endif
+
#ifdef CONFIG_HOTPLUG_CPU
void __init cpu_hotplug_init(void)
{
@@ -90,6 +104,7 @@ void __init cpu_hotplug_init(void)
spin_lock_init(&cpu_hotplug.lock);
init_waitqueue_head(&cpu_hotplug.read_queue);
init_waitqueue_head(&cpu_hotplug.write_queue);
+ cpu_hotplug_lockdep_init();
}

static void slow_path_reader_lock(void);
@@ -105,6 +120,7 @@ static void slow_path_reader_unlock(void
*/
void lock_cpu_hotplug(void)
{
+ rwlock_acquire_read(&cpu_hotplug.dep_map, 0, 0, _RET_IP_);
preempt_disable();
if (likely(!writer_exists())) {
per_cpu(refcount, smp_processor_id())++;
@@ -118,6 +134,7 @@ EXPORT_SYMBOL_GPL(lock_cpu_hotplug);

void unlock_cpu_hotplug(void)
{
+ rwlock_release(&cpu_hotplug.dep_map, 1, _RET_IP_);
preempt_disable();
if (likely(!writer_exists())) {
per_cpu(refcount, smp_processor_id())--;
@@ -135,6 +152,8 @@ static void cpu_hotplug_begin(void)
{
DECLARE_WAITQUEUE(wait, current);

+ rwlock_acquire(&cpu_hotplug.dep_map, 0, 0, _RET_IP_);
+
spin_lock(&cpu_hotplug.lock);
if (likely(cpu_hotplug.status == NO_WRITERS)) {
cpu_hotplug.status = WRITER_WAITING;
@@ -165,6 +184,7 @@ static void cpu_hotplug_begin(void)

static void cpu_hotplug_done(void)
{
+ rwlock_release(&cpu_hotplug.dep_map, 1, _RET_IP_);
spin_lock(&cpu_hotplug.lock);

if (!list_empty(&cpu_hotplug.write_queue.task_list))
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2006-10-26 21:16:22

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5] lock_cpu_hotplug: Redesign - Lightweight implementation of lock_cpu_hotplug.

Gautham wrote:
+ *- Readers assume control iff: *
+ * a) No other reader has a reference and no writer is writing. *
+ * OR *
+ * b) Atleast one reader (on *any* cpu) has a reference. *

Isn't this logically equivalent to stating:

*- Readers assume control iff no writer is writing

(Or if it's not equivalent, it might be interesting to state why.)

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

Subject: Re: [PATCH 4/5] lock_cpu_hotplug: Redesign - Lightweight implementation of lock_cpu_hotplug.

On Thu, Oct 26, 2006 at 02:14:50PM -0700, Paul Jackson wrote:
> Gautham wrote:
> + *- Readers assume control iff: *
> + * a) No other reader has a reference and no writer is writing. *
> + * OR *
> + * b) Atleast one reader (on *any* cpu) has a reference. *
>
> Isn't this logically equivalent to stating:
>
> *- Readers assume control iff no writer is writing

It is logically equivalent, but...

> (Or if it's not equivalent, it might be interesting to state why.)

I think it needs to be rephrased.

Because there may be a situation where nr_readers = 0, when a writer
arrives. The writer sets the flag to WRITER_WAITING and performs
a synchronize_sched.

During this time, if a new reader arrives at the scene, it would still
go to sleep, because there are no other active readers in the system.
This despite the fact that the writer is not *writing*.

Thanks for pointing that out :-)

>
> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <[email protected]> 1.925.600.0401

Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility,
which is still a bargain, because Freedom is priceless!"