Subject: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.


Hi,

(NOTE: This is an RFD. Patches are not for inclusion)

The current CPU-Hotplug infrastructure enables us to hotplug one CPU at any
given time. However, with newer machines which have multiple-cores and
multi-threads, it makes much sense to change the unit of hotplug to a core or
a package. We might want to evacuate a core or a package to reduce the avg
power/to manage the temperature of the system/to dynamically provision
cores/packages to a running system. But performing a series of CPU-Hotplug
is relatively slower.

Currently on a ppc64 box with 16 CPUs, the time taken for
a individual cpu-hotplug operation is as follows.

# time echo 0 > /sys/devices/system/cpu/cpu2/online
real 0m0.025s
user 0m0.000s
sys 0m0.002s

# time echo 1 > /sys/devices/system/cpu/cpu2/online
real 0m0.021s
user 0m0.000s
sys 0m0.000s

(The online time used to be ~200ms. It has been reduced after applying patch 1
of the series which reduces the polling interval from 200ms to 1ms.)

Of the this, the time taken for sending the notifications and performing
the actual cpu-hotplug operation (detailed profile is appended at the end of
the text) is:

12.645925 ms on the offline path.
21.019581 ms on the online path.

(The 10ms discrepancy that we observe in the total time taken for cpu-offline
Vs the time accounted for notifiers and cpu-hotplug operation is because of a
synchronize_sched() performed after clearing the active_cpu_mask.)

So, of the accounted time, a major chunk of time is consumed by
cpuset_track_online_cpus() while handling CPU_DEAD and CPU_ONLINE
notifications.

11.320205 ms: cpuset_track_online_cpus : CPU_DEAD
12.767882 ms: cpuset_track_online_cpus : CPU_ONLINE

cpuset_trace_online_cpus() among other things performs the task of rebuilding
the sched_domains for every online CPU in the system.

The operations performed within the cpuset_track_online_cpus()
depends only on the cpu_online_map and not on the CPU which has been
hotplugged. The other notifiers which behave similarly are
- ratelimit_handler(),
- vmstat_cpuup_callback()
- vmscan: cpu_callback()

Thus if we bunch up multiple cpu-offlines/onlines, we can reduce the overall
time taken by optimizing notifiers such as these, so that they can
perform the necessary functions only once, after the completion of the
CPU-Hotplug operation. This would cut down the CPU hotplug time substantially.

The whole approach would require the Cpu-Hotplug notifiers to work on
cpumask_t instead of cpu. A similar proposal has been once proposed before by
Shaohua Li (http://lkml.org/lkml/2006/5/8/18)

In this patch series, we extend the existing cpu online/offline
interface to enable the user to offline/online a bunch of CPUs
at the same time.

The proposed interface to do so are the sysfs file:
/sys/devices/system/cpu/online
/sys/devices/system/cpu/online

The usage is:
echo 4,6,7 > /sys/devices/system/cpu/offline
echo 5 > /sys/devices/system/cpu/offline
echo 4-7 > /sys/devices/system/cpu/online

As of now, this patch series does no optimizations to the CPU-Hotplug core
but serially hotplugs the CPUs in the list provided by the user.

The interface provided in this patch series has been tested on a
16-way ppc64 box.


Still TODO:
- Enhance the subsystem notifiers to work on a cpumask_var_t instead of a cpu
id.

- Optimize the subsystem notifiers to reduce the time consumed while
handling CPU_[DOWN_PREPARE/DEAD/UP_PREPARE/ONLINE] events for the
cpumask_var_t.

- Define the Rollback Semantics for the notifiers which fail to handle
a CPU_* event correctly.

- Send the kobject-events for the corresponding device entries of each of the
CPUs present in the list to maintain ABI compatibility.

Any feedback is much appreciated

---

Gautham R Shenoy (4):
cpu: measure time taken by subsystem notifiers during cpu-hotplug
cpu: Define new functions cpu_down_mask and cpu_up_mask
cpu: sysfs interface for hotplugging bunch of CPUs.
powerpc: cpu: Reduce the polling interval in __cpu_up()


arch/powerpc/kernel/smp.c | 5 +-
drivers/base/cpu.c | 76 ++++++++++++++++++++++++++++--
include/linux/cpu.h | 2 +
include/trace/notifier_trace.h | 32 ++++++++++++
kernel/cpu.c | 103 ++++++++++++++++++++++++++++++----------
kernel/notifier.c | 23 +++++++--
6 files changed, 203 insertions(+), 38 deletions(-)
create mode 100644 include/trace/notifier_trace.h

--
Thanks and Regards
gautham


****************** Cpu-Hotplug profile ********************************

=============================================================================
statistics for CPU_DOWN_PREPARE
=============================================================================
379 ns: buffer_cpu_notify : CPU_DOWN_PREPARE
457 ns: topology_cpu_callback : CPU_DOWN_PREPARE
504 ns: flow_cache_cpu : CPU_DOWN_PREPARE
517 ns: cpu_callback : CPU_DOWN_PREPARE
533 ns: hotplug_cfd : CPU_DOWN_PREPARE
546 ns: dev_cpu_callback : CPU_DOWN_PREPARE
547 ns: timer_cpu_notify : CPU_DOWN_PREPARE
562 ns: page_alloc_cpu_notify : CPU_DOWN_PREPARE
564 ns: cpuset_track_online_cpus : CPU_DOWN_PREPARE
594 ns: blk_cpu_notify : CPU_DOWN_PREPARE
623 ns: hotplug_hrtick : CPU_DOWN_PREPARE
623 ns: radix_tree_callback : CPU_DOWN_PREPARE
715 ns: remote_softirq_cpu_notify : CPU_DOWN_PREPARE
777 ns: rb_cpu_notify : CPU_DOWN_PREPARE
777 ns: sysfs_cpu_notify : CPU_DOWN_PREPARE
807 ns: rcu_cpu_notify : CPU_DOWN_PREPARE
820 ns: ratelimit_handler : CPU_DOWN_PREPARE
822 ns: pageset_cpuup_callback : CPU_DOWN_PREPARE
898 ns: cpu_callback : CPU_DOWN_PREPARE
898 ns: relay_hotcpu_callback : CPU_DOWN_PREPARE
929 ns: hrtimer_cpu_notify : CPU_DOWN_PREPARE
930 ns: cpu_callback : CPU_DOWN_PREPARE
1096 ns: cpu_numa_callback : CPU_DOWN_PREPARE
1096 ns: percpu_counter_hotcpu_callback: CPU_DOWN_PREPARE
1111 ns: slab_cpuup_callback : CPU_DOWN_PREPARE
1139 ns: update_runtime : CPU_DOWN_PREPARE
1143 ns: rcu_barrier_cpu_hotplug : CPU_DOWN_PREPARE
2725 ns: workqueue_cpu_callback : CPU_DOWN_PREPARE
2852 ns: migration_call : CPU_DOWN_PREPARE
4497 ns: vmstat_cpuup_callback : CPU_DOWN_PREPARE
=========================================================================
Total time for CPU_DOWN_PREPARE = .030481000 ms
=========================================================================
=============================================================================
statistics for CPU_DYING
=============================================================================
349 ns: cpu_callback : CPU_DYING
349 ns: hotplug_hrtick : CPU_DYING
349 ns: remote_softirq_cpu_notify : CPU_DYING
351 ns: timer_cpu_notify : CPU_DYING
363 ns: vmstat_cpuup_callback : CPU_DYING
364 ns: rb_cpu_notify : CPU_DYING
365 ns: blk_cpu_notify : CPU_DYING
365 ns: cpu_callback : CPU_DYING
365 ns: cpu_numa_callback : CPU_DYING
365 ns: cpuset_track_online_cpus : CPU_DYING
365 ns: dev_cpu_callback : CPU_DYING
365 ns: hotplug_cfd : CPU_DYING
365 ns: page_alloc_cpu_notify : CPU_DYING
365 ns: radix_tree_callback : CPU_DYING
365 ns: relay_hotcpu_callback : CPU_DYING
365 ns: topology_cpu_callback : CPU_DYING
365 ns: update_runtime : CPU_DYING
366 ns: pageset_cpuup_callback : CPU_DYING
367 ns: sysfs_cpu_notify : CPU_DYING
378 ns: flow_cache_cpu : CPU_DYING
380 ns: rcu_cpu_notify : CPU_DYING
381 ns: buffer_cpu_notify : CPU_DYING
381 ns: cpu_callback : CPU_DYING
383 ns: slab_cpuup_callback : CPU_DYING
455 ns: ratelimit_handler : CPU_DYING
502 ns: workqueue_cpu_callback : CPU_DYING
699 ns: percpu_counter_hotcpu_callback: CPU_DYING
1370 ns: rcu_barrier_cpu_hotplug : CPU_DYING
1583 ns: migration_call : CPU_DYING
2971 ns: hrtimer_cpu_notify : CPU_DYING
=========================================================================
Total time for CPU_DYING = .016356000 ms
=========================================================================
=============================================================================
statistics for CPU_DOWN_CANCELED
=============================================================================
=========================================================================
Total time for CPU_DOWN_CANCELED = 0 ms
=========================================================================
=============================================================================
statistics for __stop_machine
=============================================================================
556214 ns: __stop_machine :
=========================================================================
Total time for __stop_machine = .556214000 ms
=========================================================================
=============================================================================
statistics for CPU_DEAD
=============================================================================
352 ns: update_runtime : CPU_DEAD
363 ns: rb_cpu_notify : CPU_DEAD
364 ns: relay_hotcpu_callback : CPU_DEAD
367 ns: hotplug_cfd : CPU_DEAD
396 ns: cpu_callback : CPU_DEAD
411 ns: hotplug_hrtick : CPU_DEAD
426 ns: rcu_barrier_cpu_hotplug : CPU_DEAD
489 ns: remote_softirq_cpu_notify : CPU_DEAD
517 ns: ratelimit_handler : CPU_DEAD
533 ns: workqueue_cpu_callback : CPU_DEAD
626 ns: dev_cpu_callback : CPU_DEAD
867 ns: cpu_numa_callback : CPU_DEAD
1430 ns: rcu_cpu_notify : CPU_DEAD
1827 ns: blk_cpu_notify : CPU_DEAD
1933 ns: buffer_cpu_notify : CPU_DEAD
2194 ns: pageset_cpuup_callback : CPU_DEAD
2613 ns: vmstat_cpuup_callback : CPU_DEAD
2902 ns: radix_tree_callback : CPU_DEAD
4373 ns: hrtimer_cpu_notify : CPU_DEAD
5799 ns: timer_cpu_notify : CPU_DEAD
9468 ns: flow_cache_cpu : CPU_DEAD
12579 ns: cpu_callback : CPU_DEAD
13855 ns: cpu_callback : CPU_DEAD
25095 ns: topology_cpu_callback : CPU_DEAD
29020 ns: page_alloc_cpu_notify : CPU_DEAD
66894 ns: percpu_counter_hotcpu_callback: CPU_DEAD
118473 ns: slab_cpuup_callback : CPU_DEAD
153415 ns: sysfs_cpu_notify : CPU_DEAD
159933 ns: migration_call : CPU_DEAD
11320205 ns: cpuset_track_online_cpus : CPU_DEAD
=========================================================================
Total time for CPU_DEAD = 11.937719000 ms
=========================================================================
=============================================================================
statistics for CPU_POST_DEAD
=============================================================================
332 ns: remote_softirq_cpu_notify : CPU_POST_DEAD
334 ns: hotplug_hrtick : CPU_POST_DEAD
334 ns: hrtimer_cpu_notify : CPU_POST_DEAD
334 ns: radix_tree_callback : CPU_POST_DEAD
334 ns: relay_hotcpu_callback : CPU_POST_DEAD
334 ns: topology_cpu_callback : CPU_POST_DEAD
334 ns: update_runtime : CPU_POST_DEAD
335 ns: buffer_cpu_notify : CPU_POST_DEAD
348 ns: pageset_cpuup_callback : CPU_POST_DEAD
348 ns: slab_cpuup_callback : CPU_POST_DEAD
349 ns: rcu_barrier_cpu_hotplug : CPU_POST_DEAD
350 ns: cpu_callback : CPU_POST_DEAD
350 ns: flow_cache_cpu : CPU_POST_DEAD
350 ns: rb_cpu_notify : CPU_POST_DEAD
350 ns: sysfs_cpu_notify : CPU_POST_DEAD
350 ns: timer_cpu_notify : CPU_POST_DEAD
351 ns: page_alloc_cpu_notify : CPU_POST_DEAD
352 ns: cpuset_track_online_cpus : CPU_POST_DEAD
365 ns: hotplug_cfd : CPU_POST_DEAD
365 ns: vmstat_cpuup_callback : CPU_POST_DEAD
366 ns: cpu_callback : CPU_POST_DEAD
367 ns: cpu_numa_callback : CPU_POST_DEAD
368 ns: cpu_callback : CPU_POST_DEAD
395 ns: blk_cpu_notify : CPU_POST_DEAD
396 ns: rcu_cpu_notify : CPU_POST_DEAD
397 ns: dev_cpu_callback : CPU_POST_DEAD
442 ns: migration_call : CPU_POST_DEAD
563 ns: percpu_counter_hotcpu_callback: CPU_POST_DEAD
778 ns: ratelimit_handler : CPU_POST_DEAD
94184 ns: workqueue_cpu_callback : CPU_POST_DEAD
=========================================================================
Total time for CPU_POST_DEAD = .105155000 ms
=========================================================================
=============================================================================
statistics for CPU_UP_PREPARE
=============================================================================
334 ns: hotplug_hrtick : CPU_UP_PREPARE
336 ns: update_runtime : CPU_UP_PREPARE
350 ns: flow_cache_cpu : CPU_UP_PREPARE
350 ns: radix_tree_callback : CPU_UP_PREPARE
365 ns: cpuset_track_online_cpus : CPU_UP_PREPARE
365 ns: page_alloc_cpu_notify : CPU_UP_PREPARE
365 ns: sysfs_cpu_notify : CPU_UP_PREPARE
367 ns: hrtimer_cpu_notify : CPU_UP_PREPARE
381 ns: buffer_cpu_notify : CPU_UP_PREPARE
381 ns: rb_cpu_notify : CPU_UP_PREPARE
383 ns: cpu_callback : CPU_UP_PREPARE
410 ns: rcu_barrier_cpu_hotplug : CPU_UP_PREPARE
413 ns: remote_softirq_cpu_notify : CPU_UP_PREPARE
426 ns: blk_cpu_notify : CPU_UP_PREPARE
475 ns: vmstat_cpuup_callback : CPU_UP_PREPARE
518 ns: hotplug_cfd : CPU_UP_PREPARE
594 ns: percpu_counter_hotcpu_callback: CPU_UP_PREPARE
731 ns: ratelimit_handler : CPU_UP_PREPARE
805 ns: relay_hotcpu_callback : CPU_UP_PREPARE
1007 ns: dev_cpu_callback : CPU_UP_PREPARE
1690 ns: rcu_cpu_notify : CPU_UP_PREPARE
1875 ns: timer_cpu_notify : CPU_UP_PREPARE
2083 ns: pageset_cpuup_callback : CPU_UP_PREPARE
5016 ns: cpu_numa_callback : CPU_UP_PREPARE
6944 ns: topology_cpu_callback : CPU_UP_PREPARE
7064 ns: slab_cpuup_callback : CPU_UP_PREPARE
20964 ns: cpu_callback : CPU_UP_PREPARE
36301 ns: cpu_callback : CPU_UP_PREPARE
38337 ns: migration_call : CPU_UP_PREPARE
139963 ns: workqueue_cpu_callback : CPU_UP_PREPARE
=========================================================================
Total time for CPU_UP_PREPARE = .269593000 ms
=========================================================================
=============================================================================
statistics for CPU_UP_CANCELED
=============================================================================
=========================================================================
Total time for CPU_UP_CANCELED = 0 ms
=========================================================================
=============================================================================
statistics for __cpu_up
=============================================================================
7881152 ns: __cpu_up :
=========================================================================
Total time for __cpu_up = 7.881152000 ms
=========================================================================
=============================================================================
statistics for CPU_STARTING
=============================================================================
318 ns: cpu_callback : CPU_STARTING
334 ns: hotplug_cfd : CPU_STARTING
334 ns: hotplug_hrtick : CPU_STARTING
334 ns: hrtimer_cpu_notify : CPU_STARTING
336 ns: remote_softirq_cpu_notify : CPU_STARTING
336 ns: topology_cpu_callback : CPU_STARTING
348 ns: cpu_callback : CPU_STARTING
348 ns: flow_cache_cpu : CPU_STARTING
349 ns: cpu_callback : CPU_STARTING
349 ns: update_runtime : CPU_STARTING
350 ns: dev_cpu_callback : CPU_STARTING
350 ns: rb_cpu_notify : CPU_STARTING
351 ns: sysfs_cpu_notify : CPU_STARTING
352 ns: cpuset_track_online_cpus : CPU_STARTING
365 ns: vmstat_cpuup_callback : CPU_STARTING
381 ns: blk_cpu_notify : CPU_STARTING
393 ns: page_alloc_cpu_notify : CPU_STARTING
395 ns: timer_cpu_notify : CPU_STARTING
396 ns: relay_hotcpu_callback : CPU_STARTING
396 ns: slab_cpuup_callback : CPU_STARTING
397 ns: cpu_numa_callback : CPU_STARTING
397 ns: pageset_cpuup_callback : CPU_STARTING
397 ns: radix_tree_callback : CPU_STARTING
410 ns: buffer_cpu_notify : CPU_STARTING
410 ns: rcu_cpu_notify : CPU_STARTING
412 ns: rcu_barrier_cpu_hotplug : CPU_STARTING
426 ns: percpu_counter_hotcpu_callback: CPU_STARTING
549 ns: ratelimit_handler : CPU_STARTING
549 ns: workqueue_cpu_callback : CPU_STARTING
592 ns: migration_call : CPU_STARTING
=========================================================================
Total time for CPU_STARTING = .011654000 ms
=========================================================================
=============================================================================
statistics for CPU_ONLINE
=============================================================================
334 ns: hotplug_cfd : CPU_ONLINE
334 ns: relay_hotcpu_callback : CPU_ONLINE
334 ns: remote_softirq_cpu_notify : CPU_ONLINE
335 ns: hrtimer_cpu_notify : CPU_ONLINE
349 ns: topology_cpu_callback : CPU_ONLINE
352 ns: flow_cache_cpu : CPU_ONLINE
352 ns: slab_cpuup_callback : CPU_ONLINE
365 ns: dev_cpu_callback : CPU_ONLINE
365 ns: rb_cpu_notify : CPU_ONLINE
379 ns: pageset_cpuup_callback : CPU_ONLINE
381 ns: page_alloc_cpu_notify : CPU_ONLINE
381 ns: rcu_cpu_notify : CPU_ONLINE
381 ns: timer_cpu_notify : CPU_ONLINE
395 ns: hotplug_hrtick : CPU_ONLINE
410 ns: blk_cpu_notify : CPU_ONLINE
426 ns: rcu_barrier_cpu_hotplug : CPU_ONLINE
455 ns: cpu_numa_callback : CPU_ONLINE
459 ns: radix_tree_callback : CPU_ONLINE
473 ns: buffer_cpu_notify : CPU_ONLINE
504 ns: ratelimit_handler : CPU_ONLINE
639 ns: percpu_counter_hotcpu_callback: CPU_ONLINE
791 ns: update_runtime : CPU_ONLINE
1052 ns: cpu_callback : CPU_ONLINE
1282 ns: cpu_callback : CPU_ONLINE
1845 ns: cpu_callback : CPU_ONLINE
2502 ns: vmstat_cpuup_callback : CPU_ONLINE
4332 ns: migration_call : CPU_ONLINE
14505 ns: workqueue_cpu_callback : CPU_ONLINE
54588 ns: sysfs_cpu_notify : CPU_ONLINE
12767882 ns: cpuset_track_online_cpus : CPU_ONLINE
=========================================================================
Total time for CPU_ONLINE = 12.857182000 ms
=========================================================================


Subject: [RFD PATCH 1/4] powerpc: cpu: Reduce the polling interval in __cpu_up()

The cpu online operation on a powerpc today takes order of 200-220ms. Of
this time, approximately 200ms is taken up by __cpu_up(). This is because
we poll every 200ms to check if the new cpu has notified it's presence
through the cpu_callin_map. We poll every 200ms until the new cpu sets
the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.

However, the time taken by the new processor to indicate it's presence has
found to be less than a millisecond. Keeping this in mind, reduce the
polling interval from 200ms to 1ms while retaining the 5 second timeout.

Signed-off-by: Gautham R Shenoy <[email protected]>
---
arch/powerpc/kernel/smp.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 65484b2..00c13a1 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -411,9 +411,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
* CPUs can take much longer to come up in the
* hotplug case. Wait five seconds.
*/
- for (c = 25; c && !cpu_callin_map[cpu]; c--) {
- msleep(200);
- }
+ for (c = 5000; c && !cpu_callin_map[cpu]; c--)
+ msleep(1);
#endif

if (!cpu_callin_map[cpu]) {

Subject: [RFD PATCH 2/4] cpu: sysfs interface for hotplugging bunch of CPUs.

The user can currently view the online and offline CPUs in the system through
the sysfs files named "online" and "offline" respectively which
are present in the directory /sys/devices/system/cpu/. These files currently
have 0444 permissions.

For the purpose of evacuation of a bunch of CPUs, we propose to extend this
same interface and make it 0644, by which the user can use the same interface
to bring a bunch of CPUs online or take a bunch of CPUs offline.

To do this, the user is required to echo the cpu-list which is expected to be
hotplugged.

Eg:

echo 2,3 > /sys/devices/system/cpu/offline #Offlines CPUs 2 and 3
echo 4 > /sys/devices/sytem/cpu/offline #Offlines CPU 4
echo 2-4 > /sys/devices/system/cpu/online #Onlines CPU 2,3,4

This patch changes the permissions of these sysfs files from 0444 to 0644.
It provides a dummy store function, which currently parses the input
provided by the user and copies them to another debug cpumask structure,
which can be accessed using the sysfs interfaces:
/sys/devices/system/cpu/debug_offline
and
/sys/devices/system/cpu/debug_online

Thus on performing a
echo 2,3 > /sys/devices/system/cpu/offline
the operation
cat /sys/devices/system/cpu/debug_offline
should yield
2-3
as the result.

Signed-off-by: Gautham R Shenoy <[email protected]>
---
drivers/base/cpu.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..7a15e7b 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -116,18 +116,54 @@ static ssize_t print_cpus_map(char *buf, const struct cpumask *map)
return n;
}

-#define print_cpus_func(type) \
+#define show_cpus_func(type) \
static ssize_t print_cpus_##type(struct sysdev_class *class, char *buf) \
{ \
return print_cpus_map(buf, cpu_##type##_mask); \
-} \
-static struct sysdev_class_attribute attr_##type##_map = \
+}
+
+#define print_cpus_func(type) \
+show_cpus_func(type); \
+static struct sysdev_class_attribute attr_##type##_map = \
_SYSDEV_CLASS_ATTR(type, 0444, print_cpus_##type, NULL)

-print_cpus_func(online);
print_cpus_func(possible);
print_cpus_func(present);

+static struct cpumask debug_offline_mask_data;
+static struct cpumask debug_online_mask_data;
+static struct cpumask *cpu_debug_offline_mask = &debug_offline_mask_data;
+static struct cpumask *cpu_debug_online_mask = &debug_online_mask_data;
+print_cpus_func(debug_offline);
+print_cpus_func(debug_online);
+
+show_cpus_func(online);
+static ssize_t store_cpus_online(struct sysdev_class *dev_class,
+ const char *buf, size_t count)
+{
+ ssize_t ret = count;
+ cpumask_var_t store_cpus_online_mask;
+
+ if (!alloc_cpumask_var(&store_cpus_online_mask, GFP_KERNEL))
+ return count;
+
+ ret = cpulist_parse(buf, store_cpus_online_mask);
+
+ if (ret < 0)
+ goto out;
+
+ cpumask_copy(cpu_debug_online_mask, store_cpus_online_mask);
+
+out:
+ free_cpumask_var(store_cpus_online_mask);
+ if (ret >= 0)
+ ret = count;
+ return ret;
+}
+static struct sysdev_class_attribute attr_online_map =
+ _SYSDEV_CLASS_ATTR(online, 0644, print_cpus_online,
+ store_cpus_online);
+
/*
* Print values for NR_CPUS and offlined cpus
*/
@@ -168,7 +204,31 @@ static ssize_t print_cpus_offline(struct sysdev_class *class, char *buf)
n += snprintf(&buf[n], len - n, "\n");
return n;
}
-static SYSDEV_CLASS_ATTR(offline, 0444, print_cpus_offline, NULL);
+
+static ssize_t store_cpus_offline(struct sysdev_class *dev_class,
+ const char *buf, size_t count)
+{
+ ssize_t ret = count;
+ cpumask_var_t store_cpus_offline_mask;
+
+ if (!alloc_cpumask_var(&store_cpus_offline_mask, GFP_KERNEL))
+ return count;
+
+ ret = cpulist_parse(buf, store_cpus_offline_mask);
+
+ if (ret < 0)
+ goto out;
+
+ cpumask_copy(cpu_debug_offline_mask, store_cpus_offline_mask);
+
+out:
+ free_cpumask_var(store_cpus_offline_mask);
+ if (ret >= 0)
+ ret = count;
+ return ret;
+}
+static SYSDEV_CLASS_ATTR(offline, 0644, print_cpus_offline,
+ store_cpus_offline);

static struct sysdev_class_attribute *cpu_state_attr[] = {
&attr_online_map,
@@ -176,6 +236,8 @@ static struct sysdev_class_attribute *cpu_state_attr[] = {
&attr_present_map,
&attr_kernel_max,
&attr_offline,
+ &attr_debug_online_map,
+ &attr_debug_offline_map,
};

static int cpu_states_init(void)

Subject: [RFD PATCH 3/4] cpu: Define new functions cpu_down_mask and cpu_up_mask

Currently cpu-hotplug operation is carried out on a single processor at any
given time. We create two functions which will enable us to offline/online
multiple CPUs in a single go.

These functions are:
int cpu_down_mask(cpumask_var_t cpus_to_offline);
int cpu_up_mask(cpumask_var_t cpus_to_online);

In this patch, these functions serially invoke the _cpu_down() and _cpu_up()
functions for each of the CPUs in the cpumask.

The idea is to make the CPU-hotplug notifiers work on cpumasks so that they
can optimize for hotplugging multiple CPUs.

Signed-off-by: Gautham R Shenoy <[email protected]>
---
drivers/base/cpu.c | 4 ++
include/linux/cpu.h | 2 +
kernel/cpu.c | 92 +++++++++++++++++++++++++++++++++++++--------------
3 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 7a15e7b..1a382da 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -154,6 +154,8 @@ static ssize_t store_cpus_online(struct sysdev_class *dev_class,

cpumask_copy(cpu_debug_online_mask, store_cpus_online_mask);

+ ret = cpu_up_mask(store_cpus_online_mask);
+
out:
free_cpumask_var(store_cpus_online_mask);
if (ret >= 0)
@@ -221,6 +223,8 @@ static ssize_t store_cpus_offline(struct sysdev_class *dev_class,

cpumask_copy(cpu_debug_offline_mask, store_cpus_offline_mask);

+ ret = cpu_down_mask(store_cpus_offline_mask);
+
out:
free_cpumask_var(store_cpus_offline_mask);
if (ret >= 0)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2643d84..4769ff6 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -68,6 +68,7 @@ static inline void unregister_cpu_notifier(struct notifier_block *nb)
#endif

int cpu_up(unsigned int cpu);
+int cpu_up_mask(const cpumask_var_t cpus_to_online);
void notify_cpu_starting(unsigned int cpu);
extern void cpu_hotplug_init(void);
extern void cpu_maps_update_begin(void);
@@ -112,6 +113,7 @@ extern void put_online_cpus(void);
#define register_hotcpu_notifier(nb) register_cpu_notifier(nb)
#define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb)
int cpu_down(unsigned int cpu);
+int cpu_down_mask(const cpumask_var_t cpus_to_offline);

#else /* CONFIG_HOTPLUG_CPU */

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 395b697..2b5d4e0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -267,9 +267,10 @@ out_release:
return err;
}

-int __ref cpu_down(unsigned int cpu)
+int __ref cpu_down_mask(const cpumask_var_t cpus_to_offline)
{
int err;
+ unsigned int cpu;

err = stop_machine_create();
if (err)
@@ -281,28 +282,48 @@ int __ref cpu_down(unsigned int cpu)
goto out;
}

- set_cpu_active(cpu, false);
+ for_each_cpu(cpu, cpus_to_offline) {
+ set_cpu_active(cpu, false);

- /*
- * Make sure the all cpus did the reschedule and are not
- * using stale version of the cpu_active_mask.
- * This is not strictly necessary becuase stop_machine()
- * that we run down the line already provides the required
- * synchronization. But it's really a side effect and we do not
- * want to depend on the innards of the stop_machine here.
- */
- synchronize_sched();
+ /*
+ * Make sure the all cpus did the reschedule and are not
+ * using stale version of the cpu_active_mask.
+ * This is not strictly necessary becuase stop_machine()
+ * that we run down the line already provides the required
+ * synchronization. But it's really a side effect and we do not
+ * want to depend on the innards of the stop_machine here.
+ */
+ synchronize_sched();

- err = _cpu_down(cpu, 0);
+ err = _cpu_down(cpu, 0);

- if (cpu_online(cpu))
- set_cpu_active(cpu, true);
+ if (cpu_online(cpu))
+ set_cpu_active(cpu, true);
+ }

out:
cpu_maps_update_done();
stop_machine_destroy();
return err;
}
+
+int __ref cpu_down(unsigned int cpu)
+{
+ int err;
+ cpumask_var_t cpus_to_offline;
+
+ if (!alloc_cpumask_var(&cpus_to_offline, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_clear(cpus_to_offline);
+ cpumask_set_cpu(cpu, cpus_to_offline);
+
+ err = cpu_down_mask(cpus_to_offline);
+
+ free_cpumask_var(cpus_to_offline);
+
+ return err;
+}
EXPORT_SYMBOL(cpu_down);
#endif /*CONFIG_HOTPLUG_CPU*/

@@ -347,33 +368,54 @@ out_notify:
return ret;
}

-int __cpuinit cpu_up(unsigned int cpu)
+int __cpuinit cpu_up_mask(const cpumask_var_t cpus_to_online)
{
int err = 0;
- if (!cpu_possible(cpu)) {
- printk(KERN_ERR "can't online cpu %d because it is not "
- "configured as may-hotadd at boot time\n", cpu);
+ unsigned int cpu;
+
+ cpu_maps_update_begin();
+ for_each_cpu(cpu, cpus_to_online) {
+ if (!cpu_possible(cpu)) {
+ printk(KERN_ERR "can't online cpu %d because it is not"
+ " configured as may-hotadd at boot time\n", cpu);
#if defined(CONFIG_IA64) || defined(CONFIG_X86_64)
- printk(KERN_ERR "please check additional_cpus= boot "
- "parameter\n");
+ printk(KERN_ERR "please check additional_cpus= boot "
+ "parameter\n");
#endif
- return -EINVAL;
+ err = -EINVAL;
+ goto out;
+ }
}

- cpu_maps_update_begin();
-
if (cpu_hotplug_disabled) {
err = -EBUSY;
goto out;
}
-
- err = _cpu_up(cpu, 0);
+ for_each_cpu(cpu, cpus_to_online)
+ err = _cpu_up(cpu, 0);

out:
cpu_maps_update_done();
return err;
}

+int __cpuinit cpu_up(unsigned int cpu)
+{
+ int err = 0;
+ cpumask_var_t cpus_to_online;
+
+ if (!alloc_cpumask_var(&cpus_to_online, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_clear(cpus_to_online);
+ cpumask_set_cpu(cpu, cpus_to_online);
+
+ err = cpu_up_mask(cpus_to_online);
+
+ free_cpumask_var(cpus_to_online);
+
+ return err;
+}
#ifdef CONFIG_PM_SLEEP_SMP
static cpumask_var_t frozen_cpus;

Subject: [RFD PATCH 4/4] cpu: measure time taken by subsystem notifiers during cpu-hotplug

Place tracepoints at appropriate places to profile the time consumed by the
notifiers and the core-cpu-hotplug operations.

Change the notifier chain api to pass private data which can be used for
filtering out the trace results.

Signed-off-by: Gautham R Shenoy <[email protected]>
---
include/trace/notifier_trace.h | 32 ++++++++++++++++++++++++++++++++
kernel/cpu.c | 11 +++++++++++
kernel/notifier.c | 23 ++++++++++++++++++-----
3 files changed, 61 insertions(+), 5 deletions(-)
create mode 100644 include/trace/notifier_trace.h

diff --git a/include/trace/notifier_trace.h b/include/trace/notifier_trace.h
new file mode 100644
index 0000000..1591a40
--- /dev/null
+++ b/include/trace/notifier_trace.h
@@ -0,0 +1,32 @@
+#ifndef _HOTPLUG_CPU_TRACE_H_
+#define _HOTPLUG_CPU_TRACE_H_
+
+#include <linux/tracepoint.h>
+#include <linux/notifier.h>
+
+DECLARE_TRACE(hotplug_notifier_event_start,
+ TP_PROTO(void *notifier_call, unsigned int val,
+ void *chain_head),
+ TP_ARGS(notifier_call, val, chain_head));
+
+DECLARE_TRACE(hotplug_notifier_event_stop,
+ TP_PROTO(void *notifier_call, unsigned int val,
+ void *chain_head),
+ TP_ARGS(notifier_call, val, chain_head));
+
+DECLARE_TRACE(stop_machine_event_start,
+ TP_PROTO(void),
+ TP_ARGS());
+
+DECLARE_TRACE(stop_machine_event_stop,
+ TP_PROTO(void),
+ TP_ARGS());
+
+DECLARE_TRACE(cpu_up_event_start,
+ TP_PROTO(void),
+ TP_ARGS());
+
+DECLARE_TRACE(cpu_up_event_stop,
+ TP_PROTO(void),
+ TP_ARGS());
+#endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2b5d4e0..256a3e4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,7 @@
#include <linux/kthread.h>
#include <linux/stop_machine.h>
#include <linux/mutex.h>
+#include <trace/notifier_trace.h>

#ifdef CONFIG_SMP
/* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -190,6 +191,9 @@ static int __ref take_cpu_down(void *_param)
return 0;
}

+DEFINE_TRACE(stop_machine_event_start);
+DEFINE_TRACE(stop_machine_event_stop);
+
/* Requires cpu_add_remove_lock to be held */
static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
{
@@ -229,7 +233,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
set_cpus_allowed_ptr(current,
cpumask_of(cpumask_any_but(cpu_online_mask, cpu)));

+ trace_stop_machine_event_start();
err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+ trace_stop_machine_event_stop();
if (err) {
/* CPU didn't die: tell everyone. Can't complain. */
if (raw_notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED | mod,
@@ -327,6 +333,9 @@ int __ref cpu_down(unsigned int cpu)
EXPORT_SYMBOL(cpu_down);
#endif /*CONFIG_HOTPLUG_CPU*/

+DEFINE_TRACE(cpu_up_event_start);
+DEFINE_TRACE(cpu_up_event_stop);
+
/* Requires cpu_add_remove_lock to be held */
static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
{
@@ -349,7 +358,9 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
}

/* Arch-specific enabling code. */
+ trace_cpu_up_event_start();
ret = __cpu_up(cpu);
+ trace_cpu_up_event_stop();
if (ret != 0)
goto out_notify;
BUG_ON(!cpu_online(cpu));
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 61d5aa5..5729035 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -5,6 +5,7 @@
#include <linux/rcupdate.h>
#include <linux/vmalloc.h>
#include <linux/reboot.h>
+#include <trace/notifier_trace.h>

/*
* Notifier list for kernel code which wants to be called
@@ -59,6 +60,9 @@ static int notifier_chain_unregister(struct notifier_block **nl,
return -ENOENT;
}

+DEFINE_TRACE(hotplug_notifier_event_start);
+DEFINE_TRACE(hotplug_notifier_event_stop);
+
/**
* notifier_call_chain - Informs the registered notifiers about an event.
* @nl: Pointer to head of the blocking notifier chain
@@ -68,12 +72,16 @@ static int notifier_chain_unregister(struct notifier_block **nl,
* value of this parameter is -1.
* @nr_calls: Records the number of notifications sent. Don't care
* value of this field is NULL.
+ * @chain_head: Pointer to the head of the notifier chain. We cast it as
+ * void * to allow different kinds of notifier chains to
+ * pass the value of their chain heads.
* @returns: notifier_call_chain returns the value returned by the
* last notifier function called.
*/
static int __kprobes notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v,
- int nr_to_call, int *nr_calls)
+ int nr_to_call, int *nr_calls,
+ void *chain_head)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;
@@ -90,7 +98,11 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl,
continue;
}
#endif
+ trace_hotplug_notifier_event_start((void *)(nb->notifier_call),
+ val, (void *)chain_head);
ret = nb->notifier_call(nb, val, v);
+ trace_hotplug_notifier_event_stop((void *)(nb->notifier_call),
+ val, (void *) chain_head);

if (nr_calls)
(*nr_calls)++;
@@ -179,7 +191,7 @@ int __kprobes __atomic_notifier_call_chain(struct atomic_notifier_head *nh,
int ret;

rcu_read_lock();
- ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
+ ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls, nh);
rcu_read_unlock();
return ret;
}
@@ -312,7 +324,7 @@ int __blocking_notifier_call_chain(struct blocking_notifier_head *nh,
if (rcu_dereference(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, nr_to_call,
- nr_calls);
+ nr_calls, nh);
up_read(&nh->rwsem);
}
return ret;
@@ -388,7 +400,8 @@ int __raw_notifier_call_chain(struct raw_notifier_head *nh,
unsigned long val, void *v,
int nr_to_call, int *nr_calls)
{
- return notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
+ return notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls,
+ nh);
}
EXPORT_SYMBOL_GPL(__raw_notifier_call_chain);

@@ -491,7 +504,7 @@ int __srcu_notifier_call_chain(struct srcu_notifier_head *nh,
int idx;

idx = srcu_read_lock(&nh->srcu);
- ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
+ ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls, nh);
srcu_read_unlock(&nh->srcu, idx);
return ret;
}

2009-06-16 06:24:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:

> Currently on a ppc64 box with 16 CPUs, the time taken for
> a individual cpu-hotplug operation is as follows.
>
> # time echo 0 > /sys/devices/system/cpu/cpu2/online
> real 0m0.025s
> user 0m0.000s
> sys 0m0.002s
>
> # time echo 1 > /sys/devices/system/cpu/cpu2/online
> real 0m0.021s
> user 0m0.000s
> sys 0m0.000s

Surprised. Do people really online and offline CPUs frequently enough
for this to be a problem?

2009-06-16 08:07:59

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

* Andrew Morton <[email protected]> [2009-06-15 23:23:18]:

> On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
>
> > Currently on a ppc64 box with 16 CPUs, the time taken for
> > a individual cpu-hotplug operation is as follows.
> >
> > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > real 0m0.025s
> > user 0m0.000s
> > sys 0m0.002s
> >
> > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > real 0m0.021s
> > user 0m0.000s
> > sys 0m0.000s
>
> Surprised. Do people really online and offline CPUs frequently enough
> for this to be a problem?

Certainly not for hardware faults or hardware replacement, but
cpu-hotplug interface is useful for changing system configuration to
meet different objectives like

* Reduce system capacity to reduce average power and reduce heat

* Increasing number of cores and threads in a CPU package is leading
to multiple cpu offline/online operations for any perceivable effect

* Dynamically change CPU configurations in virtualized environments

Ref:

[1] Saving power by cpu evacuation sched_max_capacity_pct=n
http://lkml.org/lkml/2009/5/13/173

[2] Make offline cpus to go to deepest idle state using
http://lkml.org/lkml/2009/5/22/431

[3] cpuset: add new API to change cpuset top group's cpus
http://lkml.org/lkml/2009/5/19/54

For getting stuff off a certain CPU, cpu-hotplug framework seems to do
the right thing. Identifying bottlenecks in the framework can
significantly help other use cases.

--Vaidy

2009-06-16 16:07:20

by Nathan Lynch

[permalink] [raw]
Subject: Re: [RFD PATCH 1/4] powerpc: cpu: Reduce the polling interval in __cpu_up()

Please cc linuxppc-dev if you want the powerpc maintainer to pick this
up.

Gautham R Shenoy <[email protected]> writes:
> The cpu online operation on a powerpc today takes order of 200-220ms. Of
> this time, approximately 200ms is taken up by __cpu_up(). This is because
> we poll every 200ms to check if the new cpu has notified it's presence
> through the cpu_callin_map. We poll every 200ms until the new cpu sets
> the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.
>
> However, the time taken by the new processor to indicate it's presence has
> found to be less than a millisecond

Only with your particular configuration (which is not identified). It
can take much longer than 1ms on others.

> Keeping this in mind, reduce the
> polling interval from 200ms to 1ms while retaining the 5 second
> timeout.

Ack on the patch, but the changelog needs work. I assume your
observations are from a pseries system -- please state this in the
changelog ("powerpc" is too broad), along with the processor model and
whether the LPAR's processors were configured in dedicated or shared
mode.

2009-06-16 16:23:21

by Nathan Lynch

[permalink] [raw]
Subject: Re: [RFD PATCH 2/4] cpu: sysfs interface for hotplugging bunch of CPUs.

Gautham R Shenoy <[email protected]> writes:
> echo 2,3 > /sys/devices/system/cpu/offline #Offlines CPUs 2 and 3
> echo 4 > /sys/devices/sytem/cpu/offline #Offlines CPU 4
> echo 2-4 > /sys/devices/system/cpu/online #Onlines CPU 2,3,4
>
> This patch changes the permissions of these sysfs files from 0444 to 0644.
> It provides a dummy store function, which currently parses the input
> provided by the user and copies them to another debug cpumask structure,
> which can be accessed using the sysfs interfaces:
> /sys/devices/system/cpu/debug_offline
> and
> /sys/devices/system/cpu/debug_online
>
> Thus on performing a
> echo 2,3 > /sys/devices/system/cpu/offline
> the operation
> cat /sys/devices/system/cpu/debug_offline
> should yield
> 2-3
> as the result.

These debug_(on|off)line attributes aren't intended to be in the final
result, are they? They don't seem useful beyond the development phase
of this feature...

Subject: Re: [RFD PATCH 2/4] cpu: sysfs interface for hotplugging bunch of CPUs.

On Tue, Jun 16, 2009 at 11:22:56AM -0500, Nathan Lynch wrote:
> Gautham R Shenoy <[email protected]> writes:
> > echo 2,3 > /sys/devices/system/cpu/offline #Offlines CPUs 2 and 3
> > echo 4 > /sys/devices/sytem/cpu/offline #Offlines CPU 4
> > echo 2-4 > /sys/devices/system/cpu/online #Onlines CPU 2,3,4
> >
> > This patch changes the permissions of these sysfs files from 0444 to 0644.
> > It provides a dummy store function, which currently parses the input
> > provided by the user and copies them to another debug cpumask structure,
> > which can be accessed using the sysfs interfaces:
> > /sys/devices/system/cpu/debug_offline
> > and
> > /sys/devices/system/cpu/debug_online
> >
> > Thus on performing a
> > echo 2,3 > /sys/devices/system/cpu/offline
> > the operation
> > cat /sys/devices/system/cpu/debug_offline
> > should yield
> > 2-3
> > as the result.
>
> These debug_(on|off)line attributes aren't intended to be in the final
> result, are they? They don't seem useful beyond the development phase
> of this feature...

No, they aren't intended to be in the final result.

--
Thanks and Regards
gautham

Subject: Re: [RFD PATCH 1/4] powerpc: cpu: Reduce the polling interval in __cpu_up()

On Tue, Jun 16, 2009 at 11:06:45AM -0500, Nathan Lynch wrote:
> Please cc linuxppc-dev if you want the powerpc maintainer to pick this
> up.

Will do it. I still need to test this patch across the different
configurations. I posted it here just so that we get a rough idea
regarding what we're looking at.

Thanks for taking a look at this one!
>
> Gautham R Shenoy <[email protected]> writes:
> > The cpu online operation on a powerpc today takes order of 200-220ms. Of
> > this time, approximately 200ms is taken up by __cpu_up(). This is because
> > we poll every 200ms to check if the new cpu has notified it's presence
> > through the cpu_callin_map. We poll every 200ms until the new cpu sets
> > the value in cpu_callin_map or 5 seconds elapse, whichever comes earlier.
> >
> > However, the time taken by the new processor to indicate it's presence has
> > found to be less than a millisecond
>
> Only with your particular configuration (which is not identified). It
> can take much longer than 1ms on others.
>
> > Keeping this in mind, reduce the
> > polling interval from 200ms to 1ms while retaining the 5 second
> > timeout.
>
> Ack on the patch, but the changelog needs work. I assume your
> observations are from a pseries system -- please state this in the
> changelog ("powerpc" is too broad), along with the processor model and
> whether the LPAR's processors were configured in dedicated or shared
> mode.

Will send these details with the patch separately Ccing linux-ppcdev list.

--
Thanks and Regards
gautham

2009-06-16 21:01:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Tue, Jun 16, 2009 at 01:37:15PM +0530, Vaidyanathan Srinivasan wrote:
> * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
>
> > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> >
> > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > a individual cpu-hotplug operation is as follows.
> > >
> > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > real 0m0.025s
> > > user 0m0.000s
> > > sys 0m0.002s
> > >
> > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > real 0m0.021s
> > > user 0m0.000s
> > > sys 0m0.000s
> >
> > Surprised. Do people really online and offline CPUs frequently enough
> > for this to be a problem?
>
> Certainly not for hardware faults or hardware replacement, but
> cpu-hotplug interface is useful for changing system configuration to
> meet different objectives like
>
> * Reduce system capacity to reduce average power and reduce heat
>
> * Increasing number of cores and threads in a CPU package is leading
> to multiple cpu offline/online operations for any perceivable effect
>
> * Dynamically change CPU configurations in virtualized environments

Perhaps also reducing boot-up time? If I am correctly interpreting the
above numbers, an eight-CPU system would be consuming 175 milliseconds
bringing up the seven non-boot CPUs. Reducing this by 150 milliseconds
might be of interest to some people. ;-)

Thanx, Paul

> Ref:
>
> [1] Saving power by cpu evacuation sched_max_capacity_pct=n
> http://lkml.org/lkml/2009/5/13/173
>
> [2] Make offline cpus to go to deepest idle state using
> http://lkml.org/lkml/2009/5/22/431
>
> [3] cpuset: add new API to change cpuset top group's cpus
> http://lkml.org/lkml/2009/5/19/54
>
> For getting stuff off a certain CPU, cpu-hotplug framework seems to do
> the right thing. Identifying bottlenecks in the framework can
> significantly help other use cases.
>
> --Vaidy
>

2009-06-17 07:33:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
>
> > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> >
> > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > a individual cpu-hotplug operation is as follows.
> > >
> > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > real 0m0.025s
> > > user 0m0.000s
> > > sys 0m0.002s
> > >
> > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > real 0m0.021s
> > > user 0m0.000s
> > > sys 0m0.000s
> >
> > Surprised. Do people really online and offline CPUs frequently enough
> > for this to be a problem?
>
> Certainly not for hardware faults or hardware replacement, but
> cpu-hotplug interface is useful for changing system configuration to
> meet different objectives like
>
> * Reduce system capacity to reduce average power and reduce heat
>
> * Increasing number of cores and threads in a CPU package is leading
> to multiple cpu offline/online operations for any perceivable effect
>
> * Dynamically change CPU configurations in virtualized environments

I tend to agree with Andrew, if any of those things are done frequent
enough that the hotplug performance matter you're doing something mighty
odd.

2009-06-17 07:40:47

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

* Peter Zijlstra <[email protected]> [2009-06-17 09:32:57]:

> On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> >
> > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > >
> > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > a individual cpu-hotplug operation is as follows.
> > > >
> > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.025s
> > > > user 0m0.000s
> > > > sys 0m0.002s
> > > >
> > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.021s
> > > > user 0m0.000s
> > > > sys 0m0.000s
> > >
> > > Surprised. Do people really online and offline CPUs frequently enough
> > > for this to be a problem?
> >
> > Certainly not for hardware faults or hardware replacement, but
> > cpu-hotplug interface is useful for changing system configuration to
> > meet different objectives like
> >
> > * Reduce system capacity to reduce average power and reduce heat
> >
> > * Increasing number of cores and threads in a CPU package is leading
> > to multiple cpu offline/online operations for any perceivable effect
> >
> > * Dynamically change CPU configurations in virtualized environments
>
> I tend to agree with Andrew, if any of those things are done frequent
> enough that the hotplug performance matter you're doing something mighty
> odd.
>

Peter, what Vaidy mentioned are very useful cases, to add to that

Consider for example the need to turn of all threads belonging to a
package or the system. I can basically give out the cpu ids of all
threads and hotplug them out at once depending on the workload. In
effect turning off hyper-threading on the package.

Doing it all together provides benefits of (not complete, but
better control) over rollback apart from the speed benefit.

The benefit mentioned by Paul of speed up is very useful as well on
large systems and on the boot up time of virtual machines as well.


--
Balbir

2009-06-17 13:53:18

by Suresh Siddha

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Mon, 2009-06-15 at 22:38 -0700, Gautham R Shenoy wrote:
> So, of the accounted time, a major chunk of time is consumed by
> cpuset_track_online_cpus() while handling CPU_DEAD and CPU_ONLINE
> notifications.
>
> 11.320205 ms: cpuset_track_online_cpus : CPU_DEAD
> 12.767882 ms: cpuset_track_online_cpus : CPU_ONLINE
>
> cpuset_trace_online_cpus() among other things performs the task of rebuilding
> the sched_domains for every online CPU in the system.

Are the above numbers with CONFIG_SCHED_DEBUG turned on/off?

thanks,
suresh

2009-06-17 14:38:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Wed, Jun 17, 2009 at 09:32:57AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> >
> > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > >
> > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > a individual cpu-hotplug operation is as follows.
> > > >
> > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.025s
> > > > user 0m0.000s
> > > > sys 0m0.002s
> > > >
> > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.021s
> > > > user 0m0.000s
> > > > sys 0m0.000s
> > >
> > > Surprised. Do people really online and offline CPUs frequently enough
> > > for this to be a problem?
> >
> > Certainly not for hardware faults or hardware replacement, but
> > cpu-hotplug interface is useful for changing system configuration to
> > meet different objectives like
> >
> > * Reduce system capacity to reduce average power and reduce heat
> >
> > * Increasing number of cores and threads in a CPU package is leading
> > to multiple cpu offline/online operations for any perceivable effect
> >
> > * Dynamically change CPU configurations in virtualized environments
>
> I tend to agree with Andrew, if any of those things are done frequent
> enough that the hotplug performance matter you're doing something mighty
> odd.

Boot speedup?

Thanx, Paul

2009-06-17 15:07:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.


* Paul E. McKenney <[email protected]> wrote:

> On Wed, Jun 17, 2009 at 09:32:57AM +0200, Peter Zijlstra wrote:
> > On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> > > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> > >
> > > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > > >
> > > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > > a individual cpu-hotplug operation is as follows.
> > > > >
> > > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > > real 0m0.025s
> > > > > user 0m0.000s
> > > > > sys 0m0.002s
> > > > >
> > > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > > real 0m0.021s
> > > > > user 0m0.000s
> > > > > sys 0m0.000s
> > > >
> > > > Surprised. Do people really online and offline CPUs frequently enough
> > > > for this to be a problem?
> > >
> > > Certainly not for hardware faults or hardware replacement, but
> > > cpu-hotplug interface is useful for changing system configuration to
> > > meet different objectives like
> > >
> > > * Reduce system capacity to reduce average power and reduce heat
> > >
> > > * Increasing number of cores and threads in a CPU package is leading
> > > to multiple cpu offline/online operations for any perceivable effect
> > >
> > > * Dynamically change CPU configurations in virtualized environments
> >
> > I tend to agree with Andrew, if any of those things are done
> > frequent enough that the hotplug performance matter you're doing
> > something mighty odd.
>
> Boot speedup?

Also, if it brings more attention (and more stability and more
bugfixes) to CPU hotplug that's only good.

Ingo

2009-06-17 20:26:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Wed, 2009-06-17 at 17:07 +0200, Ingo Molnar wrote:
> * Paul E. McKenney <[email protected]> wrote:
>
> > On Wed, Jun 17, 2009 at 09:32:57AM +0200, Peter Zijlstra wrote:
> > > On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> > > > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> > > >
> > > > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > > > >
> > > > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > > > a individual cpu-hotplug operation is as follows.
> > > > > >
> > > > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > > > real 0m0.025s
> > > > > > user 0m0.000s
> > > > > > sys 0m0.002s
> > > > > >
> > > > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > > > real 0m0.021s
> > > > > > user 0m0.000s
> > > > > > sys 0m0.000s
> > > > >
> > > > > Surprised. Do people really online and offline CPUs frequently enough
> > > > > for this to be a problem?
> > > >
> > > > Certainly not for hardware faults or hardware replacement, but
> > > > cpu-hotplug interface is useful for changing system configuration to
> > > > meet different objectives like
> > > >
> > > > * Reduce system capacity to reduce average power and reduce heat
> > > >
> > > > * Increasing number of cores and threads in a CPU package is leading
> > > > to multiple cpu offline/online operations for any perceivable effect
> > > >
> > > > * Dynamically change CPU configurations in virtualized environments
> > >
> > > I tend to agree with Andrew, if any of those things are done
> > > frequent enough that the hotplug performance matter you're doing
> > > something mighty odd.
> >
> > Boot speedup?
>
> Also, if it brings more attention (and more stability and more
> bugfixes) to CPU hotplug that's only good.

Sure, but do we need the extra complexity?

I mean, sure bootup speed might be nice, but any of the scenarios given
should simply not require cpu hotplug actions of a frequent enough
nature that any performance matters.

If you want to switch off all SMT siblings you don't do that 50 times a
second, you do that once per bootup or something.

Furthermore we already established that cpu hotlpug is not the proper
interface for thermal management, and dynamically changing virtualized
muck isn't something you do at 100Hz either.

So what worries me is the justification for this work. It might be good
and nice, but if the reasons are wrong it still worries me.

So again, why? -- the bootup thing is the only sane answer so far.

2009-06-20 15:35:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2009-06-17 at 17:07 +0200, Ingo Molnar wrote:
> > * Paul E. McKenney <[email protected]> wrote:
> >
> > > On Wed, Jun 17, 2009 at 09:32:57AM +0200, Peter Zijlstra wrote:
> > > > On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
> > > > > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> > > > >
> > > > > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > > > > >
> > > > > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > > > > a individual cpu-hotplug operation is as follows.
> > > > > > >
> > > > > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > > > > real 0m0.025s
> > > > > > > user 0m0.000s
> > > > > > > sys 0m0.002s
> > > > > > >
> > > > > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > > > > real 0m0.021s
> > > > > > > user 0m0.000s
> > > > > > > sys 0m0.000s
> > > > > >
> > > > > > Surprised. Do people really online and offline CPUs frequently enough
> > > > > > for this to be a problem?
> > > > >
> > > > > Certainly not for hardware faults or hardware replacement, but
> > > > > cpu-hotplug interface is useful for changing system configuration to
> > > > > meet different objectives like
> > > > >
> > > > > * Reduce system capacity to reduce average power and reduce heat
> > > > >
> > > > > * Increasing number of cores and threads in a CPU package is leading
> > > > > to multiple cpu offline/online operations for any perceivable effect
> > > > >
> > > > > * Dynamically change CPU configurations in virtualized environments
> > > >
> > > > I tend to agree with Andrew, if any of those things are done
> > > > frequent enough that the hotplug performance matter you're doing
> > > > something mighty odd.
> > >
> > > Boot speedup?
> >
> > Also, if it brings more attention (and more stability and more
> > bugfixes) to CPU hotplug that's only good.
>
> Sure, but do we need the extra complexity?
>
> I mean, sure bootup speed might be nice, but any of the scenarios
> given should simply not require cpu hotplug actions of a frequent
> enough nature that any performance matters.

Well, the fact that the patches exist show that there's people
caring about the speedup here. The speedup itself is non-trivial.

If the patches are technically correct, and if any existing
uncleanlinesses in the affected code are fixed first (please list
any TODO items in the CPU hotplug code you might know about), then
there's no reason not to pursue these patches - unless the
complexity increase is so huge that it makes the patches technically
wrong.

The diffstat doesnt look _that_ awful IMO - 50 lines of code and i
suspect the patches come with a promise to properly handle all prior
and later bugs in this area? :)

Ingo

2009-06-22 06:09:19

by Nathan Lynch

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

Ingo Molnar <[email protected]> writes:

> * Peter Zijlstra <[email protected]> wrote:
>
>> On Wed, 2009-06-17 at 17:07 +0200, Ingo Molnar wrote:
>> > * Paul E. McKenney <[email protected]> wrote:
>> >
>> > > On Wed, Jun 17, 2009 at 09:32:57AM +0200, Peter Zijlstra wrote:
>> > > > On Tue, 2009-06-16 at 13:37 +0530, Vaidyanathan Srinivasan wrote:
>> > > > > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
>> > > > >
>> > > > > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
>> > > > > >
>> > > > > > > Currently on a ppc64 box with 16 CPUs, the time taken for
>> > > > > > > a individual cpu-hotplug operation is as follows.
>> > > > > > >
>> > > > > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
>> > > > > > > real 0m0.025s
>> > > > > > > user 0m0.000s
>> > > > > > > sys 0m0.002s
>> > > > > > >
>> > > > > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
>> > > > > > > real 0m0.021s
>> > > > > > > user 0m0.000s
>> > > > > > > sys 0m0.000s
>> > > > > >
>> > > > > > Surprised. Do people really online and offline CPUs frequently enough
>> > > > > > for this to be a problem?
>> > > > >
>> > > > > Certainly not for hardware faults or hardware replacement, but
>> > > > > cpu-hotplug interface is useful for changing system configuration to
>> > > > > meet different objectives like
>> > > > >
>> > > > > * Reduce system capacity to reduce average power and reduce heat
>> > > > >
>> > > > > * Increasing number of cores and threads in a CPU package is leading
>> > > > > to multiple cpu offline/online operations for any perceivable effect
>> > > > >
>> > > > > * Dynamically change CPU configurations in virtualized environments
>> > > >
>> > > > I tend to agree with Andrew, if any of those things are done
>> > > > frequent enough that the hotplug performance matter you're doing
>> > > > something mighty odd.
>> > >
>> > > Boot speedup?
>> >
>> > Also, if it brings more attention (and more stability and more
>> > bugfixes) to CPU hotplug that's only good.
>>
>> Sure, but do we need the extra complexity?
>>
>> I mean, sure bootup speed might be nice, but any of the scenarios
>> given should simply not require cpu hotplug actions of a frequent
>> enough nature that any performance matters.
>
> Well, the fact that the patches exist show that there's people
> caring about the speedup here. The speedup itself is non-trivial.

If I correctly understand the behavior of the patch set as posted, there
is no speedup beyond eliminating the overhead of multiple writes to
/sys/devices/system/cpu/cpu*/online. To obtain non-trivial speedups via
bulk hotplug, one or both of the following items from the TODO list need
to be completed:

- Enhance the subsystem notifiers to work on a cpumask_var_t instead of a cpu
id.

- Optimize the subsystem notifiers to reduce the time consumed while
handling CPU_[DOWN_PREPARE/DEAD/UP_PREPARE/ONLINE] events for the
cpumask_var_t.

Right?

(The powerpc-specific patch at the beginning of the series improves
hot-online time for a single cpu in some circumstances and is basically
unrelated to the aim of the patch set -- it should go upstream through
the powerpc tree independently.)

2009-06-27 11:27:27

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFD PATCH 0/4] cpu: Bulk CPU Hotplug support.

On Tue 2009-06-16 14:00:59, Paul E. McKenney wrote:
> On Tue, Jun 16, 2009 at 01:37:15PM +0530, Vaidyanathan Srinivasan wrote:
> > * Andrew Morton <[email protected]> [2009-06-15 23:23:18]:
> >
> > > On Tue, 16 Jun 2009 11:08:39 +0530 Gautham R Shenoy <[email protected]> wrote:
> > >
> > > > Currently on a ppc64 box with 16 CPUs, the time taken for
> > > > a individual cpu-hotplug operation is as follows.
> > > >
> > > > # time echo 0 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.025s
> > > > user 0m0.000s
> > > > sys 0m0.002s
> > > >
> > > > # time echo 1 > /sys/devices/system/cpu/cpu2/online
> > > > real 0m0.021s
> > > > user 0m0.000s
> > > > sys 0m0.000s
> > >
> > > Surprised. Do people really online and offline CPUs frequently enough
> > > for this to be a problem?
> >
> > Certainly not for hardware faults or hardware replacement, but
> > cpu-hotplug interface is useful for changing system configuration to
> > meet different objectives like
> >
> > * Reduce system capacity to reduce average power and reduce heat
> >
> > * Increasing number of cores and threads in a CPU package is leading
> > to multiple cpu offline/online operations for any perceivable effect
> >
> > * Dynamically change CPU configurations in virtualized environments
>
> Perhaps also reducing boot-up time? If I am correctly interpreting the
> above numbers, an eight-CPU system would be consuming 175 milliseconds
> bringing up the seven non-boot CPUs. Reducing this by 150 milliseconds
> might be of interest to some people. ;-)

...also it should save 300msec from s2ram cycle. Actually maybe
suspend code should be modified first, as it can demonstrate the
changes without kernel<-> user interface changing?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html