2007-05-24 12:10:41

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 0/7] KVM: Suspend and cpu hotplug fixes

The following patchset makes kvm more robust wrt cpu hotunplug, and
makes suspend-to-ram actually work. Suspend-to-disk benefits from
the cpu hotunplug improvements as well.

The major issue is that KVM wants to disable the virtualization
extensions at a point in time when no user processes are schedulable
on the victim cpu. No current notifier exists, so a new one, CPU_DYING,
is added for the purpose.

Should there be no objections, I will submit this patchset for inclusion
in 2.6.22, and backport it to 2.6.21.stable.


2007-05-24 12:10:28

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 2/7] HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING

CPU_DYING is called in atomic context, so don't try to take any locks.

Signed-off-by: Avi Kivity <[email protected]>
---
kernel/cpuset.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f57854b..d4ab1c6 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2138,6 +2138,9 @@ static void common_cpu_mem_hotplug_unplug(void)
static int cpuset_handle_cpuhp(struct notifier_block *nb,
unsigned long phase, void *cpu)
{
+ if (phase == CPU_DYING)
+ return NOTIFY_DONE;
+
common_cpu_mem_hotplug_unplug();
return 0;
}
--
1.5.0.6

2007-05-24 12:10:53

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 3/7] HOTPLUG: Adapt thermal throttle to CPU_DYING

CPU_DYING is notified in atomic context, so no taking mutexes here.

Signed-off-by: Avi Kivity <[email protected]>
---
arch/i386/kernel/cpu/mcheck/therm_throt.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/cpu/mcheck/therm_throt.c b/arch/i386/kernel/cpu/mcheck/therm_throt.c
index 7ba7c3a..1203dc5 100644
--- a/arch/i386/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/i386/kernel/cpu/mcheck/therm_throt.c
@@ -134,19 +134,21 @@ static __cpuinit int thermal_throttle_cpu_callback(struct notifier_block *nfb,
int err;

sys_dev = get_cpu_sysdev(cpu);
- mutex_lock(&therm_cpu_lock);
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+ mutex_lock(&therm_cpu_lock);
err = thermal_throttle_add_dev(sys_dev);
+ mutex_unlock(&therm_cpu_lock);
WARN_ON(err);
break;
case CPU_DEAD:
case CPU_DEAD_FROZEN:
+ mutex_lock(&therm_cpu_lock);
thermal_throttle_remove_dev(sys_dev);
+ mutex_unlock(&therm_cpu_lock);
break;
}
- mutex_unlock(&therm_cpu_lock);
return NOTIFY_OK;
}

--
1.5.0.6

2007-05-24 12:11:19

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 5/7] KVM: Keep track of which cpus have virtualization enabled

By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.

Signed-off-by: Avi Kivity <[email protected]>
---
drivers/kvm/kvm_main.c | 47 +++++++++++++++++++++++++++++++++++------------
1 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0d89260..9738d51 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -40,6 +40,8 @@
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/sched.h>
+#include <linux/cpumask.h>

#include "x86_emulate.h"
#include "segment_descriptor.h"
@@ -50,8 +52,12 @@ MODULE_LICENSE("GPL");
static DEFINE_SPINLOCK(kvm_lock);
static LIST_HEAD(vm_list);

+static cpumask_t cpus_hardware_enabled;
+
struct kvm_arch_ops *kvm_arch_ops;

+static void hardware_disable(void *ignored);
+
#define STAT_OFFSET(x) offsetof(struct kvm_vcpu, stat.x)

static struct kvm_stats_debugfs_item {
@@ -2839,7 +2845,7 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
* in vmx root mode.
*/
printk(KERN_INFO "kvm: exiting hardware virtualization\n");
- on_each_cpu(kvm_arch_ops->hardware_disable, NULL, 0, 1);
+ on_each_cpu(hardware_disable, NULL, 0, 1);
}
return NOTIFY_OK;
}
@@ -2882,6 +2888,27 @@ static void decache_vcpus_on_cpu(int cpu)
spin_unlock(&kvm_lock);
}

+static void hardware_enable(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ if (cpu_isset(cpu, cpus_hardware_enabled))
+ return;
+ cpu_set(cpu, cpus_hardware_enabled);
+ kvm_arch_ops->hardware_enable(NULL);
+}
+
+static void hardware_disable(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ if (!cpu_isset(cpu, cpus_hardware_enabled))
+ return;
+ cpu_clear(cpu, cpus_hardware_enabled);
+ decache_vcpus_on_cpu(cpu);
+ kvm_arch_ops->hardware_disable(NULL);
+}
+
static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
void *v)
{
@@ -2894,16 +2921,13 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
case CPU_UP_CANCELED_FROZEN:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
- decache_vcpus_on_cpu(cpu);
- smp_call_function_single(cpu, kvm_arch_ops->hardware_disable,
- NULL, 0, 1);
+ smp_call_function_single(cpu, hardware_disable, NULL, 0, 1);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",
cpu);
- smp_call_function_single(cpu, kvm_arch_ops->hardware_enable,
- NULL, 0, 1);
+ smp_call_function_single(cpu, hardware_enable, NULL, 0, 1);
break;
}
return NOTIFY_OK;
@@ -2960,14 +2984,13 @@ static void kvm_exit_debug(void)

static int kvm_suspend(struct sys_device *dev, pm_message_t state)
{
- decache_vcpus_on_cpu(raw_smp_processor_id());
- on_each_cpu(kvm_arch_ops->hardware_disable, NULL, 0, 1);
+ on_each_cpu(hardware_disable, NULL, 0, 0);
return 0;
}

static int kvm_resume(struct sys_device *dev)
{
- on_each_cpu(kvm_arch_ops->hardware_enable, NULL, 0, 1);
+ on_each_cpu(hardware_disable, NULL, 0, 0);
return 0;
}

@@ -3020,7 +3043,7 @@ int kvm_init_arch(struct kvm_arch_ops *ops, struct module *module)
if (r < 0)
goto out;

- on_each_cpu(kvm_arch_ops->hardware_enable, NULL, 0, 1);
+ on_each_cpu(hardware_enable, NULL, 0, 1);
r = register_cpu_notifier(&kvm_cpu_notifier);
if (r)
goto out_free_1;
@@ -3052,7 +3075,7 @@ out_free_2:
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
out_free_1:
- on_each_cpu(kvm_arch_ops->hardware_disable, NULL, 0, 1);
+ on_each_cpu(hardware_disable, NULL, 0, 1);
kvm_arch_ops->hardware_unsetup();
out:
kvm_arch_ops = NULL;
@@ -3066,7 +3089,7 @@ void kvm_exit_arch(void)
sysdev_class_unregister(&kvm_sysdev_class);
unregister_reboot_notifier(&kvm_reboot_notifier);
unregister_cpu_notifier(&kvm_cpu_notifier);
- on_each_cpu(kvm_arch_ops->hardware_disable, NULL, 0, 1);
+ on_each_cpu(hardware_disable, NULL, 0, 1);
kvm_arch_ops->hardware_unsetup();
kvm_arch_ops = NULL;
}
--
1.5.0.6

2007-05-24 12:11:35

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 1/7] HOTPLUG: Add CPU_DYING notifier

KVM wants a notification when a cpu is about to die, so it can disable
hardware extensions, but at a time when user processes cannot be scheduled
on the cpu, so it doesn't try to use virtualization extensions after they
have been disabled.

This adds a CPU_DYING notification. The notification is called in atomic
context on the doomed cpu.

Signed-off-by: Avi Kivity <[email protected]>
---
include/linux/notifier.h | 3 +++
kernel/cpu.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 9431101..576f2bb 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -196,6 +196,8 @@ extern int __srcu_notifier_call_chain(struct srcu_notifier_head *nh,
#define CPU_DEAD 0x0007 /* CPU (unsigned)v dead */
#define CPU_LOCK_ACQUIRE 0x0008 /* Acquire all hotcpu locks */
#define CPU_LOCK_RELEASE 0x0009 /* Release all hotcpu locks */
+#define CPU_DYING 0x000A /* CPU (unsigned)v not running any task,
+ * not handling interrupts, soon dead */

/* Used for CPU hotplug events occuring while tasks are frozen due to a suspend
* operation in progress
@@ -208,6 +210,7 @@ extern int __srcu_notifier_call_chain(struct srcu_notifier_head *nh,
#define CPU_DOWN_PREPARE_FROZEN (CPU_DOWN_PREPARE | CPU_TASKS_FROZEN)
#define CPU_DOWN_FAILED_FROZEN (CPU_DOWN_FAILED | CPU_TASKS_FROZEN)
#define CPU_DEAD_FROZEN (CPU_DEAD | CPU_TASKS_FROZEN)
+#define CPU_DYING_FROZEN (CPU_DYING | CPU_TASKS_FROZEN)

#endif /* __KERNEL__ */
#endif /* _LINUX_NOTIFIER_H */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 208cf34..181ae70 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -103,11 +103,19 @@ static inline void check_for_tasks(int cpu)
write_unlock_irq(&tasklist_lock);
}

+struct take_cpu_down_param {
+ unsigned long mod;
+ void *hcpu;
+};
+
/* Take this CPU down. */
-static int take_cpu_down(void *unused)
+static int take_cpu_down(void *_param)
{
+ struct take_cpu_down_param *param = _param;
int err;

+ raw_notifier_call_chain(&cpu_chain, CPU_DYING | param->mod,
+ param->hcpu);
/* Ensure this CPU doesn't handle any more interrupts. */
err = __cpu_disable();
if (err < 0)
@@ -127,6 +135,10 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
cpumask_t old_allowed, tmp;
void *hcpu = (void *)(long)cpu;
unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
+ struct take_cpu_down_param tcd_param = {
+ .mod = mod,
+ .hcpu = hcpu,
+ };

if (num_online_cpus() == 1)
return -EBUSY;
@@ -153,7 +165,7 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen)
set_cpus_allowed(current, tmp);

mutex_lock(&cpu_bitmask_lock);
- p = __stop_machine_run(take_cpu_down, NULL, cpu);
+ p = __stop_machine_run(take_cpu_down, &tcd_param, cpu);
mutex_unlock(&cpu_bitmask_lock);

if (IS_ERR(p) || cpu_online(cpu)) {
--
1.5.0.6

2007-05-24 12:11:52

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 6/7] KVM: Tune hotplug/suspend IPIs

The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_one_cpu(). Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.

Signed-off-by: Avi Kivity <[email protected]>
---
drivers/kvm/kvm_main.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 9738d51..a632c8d 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2921,13 +2921,13 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
case CPU_UP_CANCELED_FROZEN:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
cpu);
- smp_call_function_single(cpu, hardware_disable, NULL, 0, 1);
+ on_one_cpu(cpu, hardware_disable, NULL, 0, 1);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",
cpu);
- smp_call_function_single(cpu, hardware_enable, NULL, 0, 1);
+ on_one_cpu(cpu, hardware_enable, NULL, 0, 1);
break;
}
return NOTIFY_OK;
@@ -2984,13 +2984,13 @@ static void kvm_exit_debug(void)

static int kvm_suspend(struct sys_device *dev, pm_message_t state)
{
- on_each_cpu(hardware_disable, NULL, 0, 0);
+ hardware_disable(NULL);
return 0;
}

static int kvm_resume(struct sys_device *dev)
{
- on_each_cpu(hardware_disable, NULL, 0, 0);
+ hardware_enable(NULL);
return 0;
}

--
1.5.0.6

2007-05-24 12:12:16

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 7/7] KVM: Use CPU_DYING for disabling virtualization

Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.

Signed-off-by: Avi Kivity <[email protected]>
---
drivers/kvm/kvm_main.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index a632c8d..e9aa86d 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2915,8 +2915,8 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
int cpu = (long)v;

switch (val) {
- case CPU_DOWN_PREPARE:
- case CPU_DOWN_PREPARE_FROZEN:
+ case CPU_DYING:
+ case CPU_DYING_FROZEN:
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
printk(KERN_INFO "kvm: disabling virtualization on CPU%d\n",
--
1.5.0.6

2007-05-24 12:12:36

by Avi Kivity

[permalink] [raw]
Subject: [PATCH 4/7] SMP: Implement on_one_cpu()

This defines on_one_cpu() which is similar to smp_call_function_single()
except that it works if cpu happens to be the current cpu. Can also be
seen as a complement to on_each_cpu() (which also doesn't treat the
current cpu specially).

Signed-off-by: Avi Kivity <[email protected]>
---
include/linux/smp.h | 15 +++++++++++++++
kernel/softirq.c | 24 ++++++++++++++++++++++++
2 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 3f70149..4ff8d68 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -60,6 +60,11 @@ int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
* Call a function on all processors
*/
int on_each_cpu(void (*func) (void *info), void *info, int retry, int wait);
+/*
+ * Call a function on one processor
+ */
+int on_one_cpu(int cpu, void (*func)(void *info), void *info,
+ int retry, int wait);

#define MSG_ALL_BUT_SELF 0x8000 /* Assume <32768 CPU's */
#define MSG_ALL 0x8001
@@ -95,6 +100,16 @@ static inline int up_smp_call_function(void)
local_irq_enable(); \
0; \
})
+
+static inline int on_one_cpu(int cpu, void (*func)(void *info), void *info,
+ int retry, int wait)
+{
+ local_irq_disable();
+ func(info);
+ local_irq_enable();
+ return 0;
+}
+
static inline void smp_send_reschedule(int cpu) { }
#define num_booting_cpus() 1
#define smp_prepare_boot_cpu() do {} while (0)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 0b9886a..b1a3284 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -658,4 +658,28 @@ int on_each_cpu(void (*func) (void *info), void *info, int retry, int wait)
return ret;
}
EXPORT_SYMBOL(on_each_cpu);
+
+/*
+ * Call a function on one processor, which might be the currently executing
+ * processor.
+ */
+int on_one_cpu(int cpu, void (*func) (void *info), void *info,
+ int retry, int wait)
+{
+ int ret;
+ int this_cpu;
+
+ this_cpu = get_cpu();
+ if (this_cpu == cpu) {
+ local_irq_disable();
+ func(info);
+ local_irq_enable();
+ ret = 0;
+ } else
+ ret = smp_call_function_single(cpu, func, info, retry, wait);
+ put_cpu();
+ return ret;
+}
+EXPORT_SYMBOL(on_one_cpu);
+
#endif
--
1.5.0.6

2007-05-24 12:53:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/7] KVM: Suspend and cpu hotplug fixes

Avi Kivity wrote:
> The following patchset makes kvm more robust wrt cpu hotunplug, and
> makes suspend-to-ram actually work. Suspend-to-disk benefits from
> the cpu hotunplug improvements as well.
>
>

Here's the patchset diffstat in case anyone's interested:

arch/i386/kernel/cpu/mcheck/therm_throt.c | 6 ++-
drivers/kvm/kvm_main.c | 51
+++++++++++++++++++++--------
include/linux/notifier.h | 3 ++
include/linux/smp.h | 15 ++++++++
kernel/cpu.c | 16 ++++++++-
kernel/cpuset.c | 3 ++
kernel/softirq.c | 24 +++++++++++++
7 files changed, 100 insertions(+), 18 deletions(-)


--
error compiling committee.c: too many arguments to function

2007-05-24 13:37:18

by Heiko Carstens

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 4/7] SMP: Implement on_one_cpu()

On Thu, May 24, 2007 at 03:10:12PM +0300, Avi Kivity wrote:
> This defines on_one_cpu() which is similar to smp_call_function_single()
> except that it works if cpu happens to be the current cpu. Can also be
> seen as a complement to on_each_cpu() (which also doesn't treat the
> current cpu specially).
>
> Signed-off-by: Avi Kivity <[email protected]>
> ---
> include/linux/smp.h | 15 +++++++++++++++
> kernel/softirq.c | 24 ++++++++++++++++++++++++
> 2 files changed, 39 insertions(+), 0 deletions(-)
>
> +/*
> + * Call a function on one processor
> + */
> +int on_one_cpu(int cpu, void (*func)(void *info), void *info,
> + int retry, int wait);
>

Would you mind renaming that one to simply 'on_cpu'? It's even shorter and
clearly everybody will know what its purpose is. Also I doubt we will ever
have something like 'on_two_cpus'.

2007-05-24 13:42:26

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 4/7] SMP: Implement on_one_cpu()

Heiko Carstens wrote:
> On Thu, May 24, 2007 at 03:10:12PM +0300, Avi Kivity wrote:
>
>> This defines on_one_cpu() which is similar to smp_call_function_single()
>> except that it works if cpu happens to be the current cpu. Can also be
>> seen as a complement to on_each_cpu() (which also doesn't treat the
>> current cpu specially).
>>
>> Signed-off-by: Avi Kivity <[email protected]>
>> ---
>> include/linux/smp.h | 15 +++++++++++++++
>> kernel/softirq.c | 24 ++++++++++++++++++++++++
>> 2 files changed, 39 insertions(+), 0 deletions(-)
>>
>> +/*
>> + * Call a function on one processor
>> + */
>> +int on_one_cpu(int cpu, void (*func)(void *info), void *info,
>> + int retry, int wait);
>>
>>
>
> Would you mind renaming that one to simply 'on_cpu'? It's even shorter and
> clearly everybody will know what its purpose is. Also I doubt we will ever
> have something like 'on_two_cpus'.
>

That was my first choice, but then I went for symmetry with
on_each_cpu(). I'll rename it to on_cpu() unless there are objections.


--
error compiling committee.c: too many arguments to function

2007-05-24 13:43:48

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 4/7] SMP: Implement on_one_cpu()

I don't see any documented restrictions about preemption being
disabled when this function is called, but...

> +int on_one_cpu(int cpu, void (*func) (void *info), void *info,
> + int retry, int wait)
> +{
> + int ret;
> + int this_cpu;
> +
> + this_cpu = get_cpu();

what if a preempt and reschedule to a different CPU happens right
here, after this_cpu is set?

> + if (this_cpu == cpu) {

2007-05-24 13:48:17

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 4/7] SMP: Implement on_one_cpu()

Roland Dreier wrote:
> I don't see any documented restrictions about preemption being
> disabled when this function is called, but...
>
> > +int on_one_cpu(int cpu, void (*func) (void *info), void *info,
> > + int retry, int wait)
> > +{
> > + int ret;
> > + int this_cpu;
> > +
> > + this_cpu = get_cpu();
>
> what if a preempt and reschedule to a different CPU happens right
> here, after this_cpu is set?
>
> > + if (this_cpu == cpu) {
>

get_cpu() disables preemption (the return value would be meaningless
otherwise).


--
error compiling committee.c: too many arguments to function

2007-05-25 01:14:39

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 0/7] KVM: Suspend and cpu hotplug fixes

On Thu, 2007-05-24 at 20:10 +0800, Avi Kivity wrote:
> The following patchset makes kvm more robust wrt cpu hotunplug, and
> makes suspend-to-ram actually work. Suspend-to-disk benefits from
> the cpu hotunplug improvements as well.
>
> The major issue is that KVM wants to disable the virtualization
> extensions at a point in time when no user processes are schedulable
> on the victim cpu. No current notifier exists, so a new one,
> CPU_DYING,
> is added for the purpose.
>
> Should there be no objections, I will submit this patchset for
> inclusion
> in 2.6.22, and backport it to 2.6.21.stable.
Is it possible disabling kvm can be done at the begining of play_dead?
take_cpu_done is designed to run fast.

Thanks,
Shaohua

2007-05-25 08:28:29

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/7] KVM: Suspend and cpu hotplug fixes

Shaohua Li wrote:
> On Thu, 2007-05-24 at 20:10 +0800, Avi Kivity wrote:
>
>> The following patchset makes kvm more robust wrt cpu hotunplug, and
>> makes suspend-to-ram actually work. Suspend-to-disk benefits from
>> the cpu hotunplug improvements as well.
>>
>> The major issue is that KVM wants to disable the virtualization
>> extensions at a point in time when no user processes are schedulable
>> on the victim cpu. No current notifier exists, so a new one,
>> CPU_DYING,
>> is added for the purpose.
>>
>> Should there be no objections, I will submit this patchset for
>> inclusion
>> in 2.6.22, and backport it to 2.6.21.stable.
>>
> Is it possible disabling kvm can be done at the begining of play_dead?
> take_cpu_done is designed to run fast.
>
>

It's possible, but I have issues with play_dead():

- it is arch specific, so we need to modify i386, x86_64, and ia64 (when
we have an ia64 kvm port)
- there is no hook available here to call modules like the hotplug notifier

I estimate that that take_cpu_down will run for about a millisecond if
there are a few hundred vcpus which have last run on the dying cpu (and
that's an extreme case, which is not expected in normal operation). If
that's too much, it can be reduced as follows:

- add a per-cpu list of vcpus that have last run on a cpu, maintained at
runtime
- on CPU_DOWN_PREPARE, walk the list and vmclear any vcpus that last ran
on the dying cpu
- on CPU_DYING (take_cpu_down), walk the list again an vmclear any vcpus
that managed to get scheduled to the dying vcpu again. the list should
not have more than 1-2 entries in normal operation.


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-05-27 10:21:01

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/7] KVM: Suspend and cpu hotplug fixes

Avi Kivity wrote:
> I estimate that that take_cpu_down will run for about a millisecond if
> there are a few hundred vcpus which have last run on the dying cpu (and
> that's an extreme case, which is not expected in normal operation).

I measured vmclear time on an uncached vmcs (which would be all except
for a handful which are cached on the cpu core) at 144 cycles. Assuming
a couple of cache misses for walking the list and accessing the vmcs,
we're at about 500 cycles per vcpu, or 250us @ 2GHz. So worst case is
significantly less than 1 ms.

Is this acceptable for take_cpu_down()?

--
error compiling committee.c: too many arguments to function