Hi Avi, Marcelo,
here is the patch-set to implement the TSC-scaling feature of upcoming
AMD CPUs. When this feature is supported the CPU provides a new MSR
which holds a multiplier for the hardware TSC which is applied on the
value rdtsc[p] and reads of MSR 0x10. This feature can be used to
emulate a given tsc frequency for the guest.
Patch 1 is not directly related to this patch-set because it only fixes
a bug which prevented me from testing these patches. In fact it fixes
the same bug Andre sent a patch for. But after the discussion about his
patch he told me to just post my patch and thus here it is.
Thanks,
Joerg
Diff-stat:
arch/x86/include/asm/kvm_host.h | 4 ++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/svm.c | 91 +++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx.c | 12 +++++
arch/x86/kvm/x86.c | 60 ++++++++++++++++++++++---
include/linux/kvm.h | 4 ++
6 files changed, 164 insertions(+), 8 deletions(-)
Shortlog:
Joerg Roedel (6):
KVM: SVM: Advance instruction pointer in dr_intercept
KVM: SVM: Implement infrastructure for TSC_RATE_MSR
KVM: X86: Let kvm-clock report the right tsc frequency
KVM: SVM: Propagate requested TSC frequency on vcpu init
KVM: X86: Delegate tsc-offset calculation to architecture code
KVM: X86: Implement userspace interface to set virtual_tsc_khz
In the dr_intercept function a new cpu-feature called
decode-assists is implemented and used when available. This
code-path does not advance the guest-rip causing the guest
to dead-loop over mov-dr instructions. This is fixed by this
patch.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kvm/svm.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 73a8f1d..bfb4948 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2777,6 +2777,8 @@ static int dr_interception(struct vcpu_svm *svm)
kvm_register_write(&svm->vcpu, reg, val);
}
+ skip_emulated_instruction(&svm->vcpu);
+
return 1;
}
--
1.7.1
This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 3 +++
arch/x86/kvm/x86.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/kvm.h | 4 ++++
4 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8c40425..dfdd0aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -633,6 +633,7 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
extern bool tdp_enabled;
+extern int kvm_has_tsc_control;
enum emulation_result {
EMULATE_DONE, /* no further processing */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f938585..0b8f4f7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -797,6 +797,9 @@ static __init int svm_hardware_setup(void)
if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
kvm_enable_efer_bits(EFER_FFXSR);
+ if (boot_cpu_has(X86_FEATURE_TSCRATEMSR))
+ kvm_has_tsc_control = 1;
+
if (nested) {
printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6caaf4b..1ac94cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -99,6 +99,9 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops);
int ignore_msrs = 0;
module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
+int kvm_has_tsc_control;
+EXPORT_SYMBOL_GPL(kvm_has_tsc_control);
+
#define KVM_NR_SHARED_MSRS 16
struct kvm_shared_msrs_global {
@@ -2024,6 +2027,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XCRS:
r = cpu_has_xsave;
break;
+ case KVM_CAP_TSC_CONTROL:
+ r = kvm_has_tsc_control;
+ break;
default:
r = 0;
break;
@@ -3575,6 +3581,38 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_SET_TSC_KHZ: {
+ u32 user_tsc_khz;
+
+ if (!kvm_has_tsc_control)
+ break;
+
+ /*
+ * We force the tsc frequency to be set before any
+ * vcpu is created
+ */
+ if (atomic_read(&kvm->online_vcpus) > 0)
+ goto out;
+
+ user_tsc_khz = arg;
+
+ kvm_arch_set_tsc_khz(kvm, user_tsc_khz);
+
+ r = 0;
+ goto out;
+ }
+ case KVM_GET_TSC_KHZ: {
+
+ if (!kvm_has_tsc_control)
+ break;
+
+ r = -EFAULT;
+ if (copy_to_user(argp, &kvm->arch.virtual_tsc_khz, sizeof(__u32)))
+ goto out;
+
+ r = 0;
+ goto out;
+ }
default:
;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..a96ff92 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,7 @@ struct kvm_ppc_pvinfo {
#define KVM_CAP_PPC_GET_PVINFO 57
#define KVM_CAP_PPC_IRQ_LEVEL 58
#define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_TSC_CONTROL 60
#ifdef KVM_CAP_IRQ_ROUTING
@@ -677,6 +678,9 @@ struct kvm_clock_data {
#define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
/* Available with KVM_CAP_PPC_GET_PVINFO */
#define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo)
+/* Available with KVM_CAP_TSC_CONTROL */
+#define KVM_SET_TSC_KHZ _IOW(KVMIO, 0xa2, __u32)
+#define KVM_GET_TSC_KHZ _IOR(KVMIO, 0xa3, __u32)
/*
* ioctls for vcpu fds
--
1.7.1
This patch changes the kvm_guest_time_update function to use
TSC frequency the guest actually has for updating its clock.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm.c | 8 ++++++++
arch/x86/kvm/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 12 ++++++++++--
4 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffd7f8d..9686950 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -592,6 +592,8 @@ struct kvm_x86_ops {
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
+ bool (*use_virtual_tsc_khz)(struct kvm_vcpu *vcpu);
+
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
const struct trace_print_flags *exit_reasons_str;
};
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c96c0a6..f51f757 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -905,6 +905,13 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment)
mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
}
+static bool svm_use_virtual_tsc_khz(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ return svm->tsc_scale.enabled;
+}
+
static void init_vmcb(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3976,6 +3983,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
+ .use_virtual_tsc_khz = svm_use_virtual_tsc_khz,
.set_tdp_cr3 = set_tdp_cr3,
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae4f02d..c227a6b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1164,6 +1164,11 @@ static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment)
vmcs_write64(TSC_OFFSET, offset + adjustment);
}
+static bool vmx_use_virtual_tsc_khz(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
/*
* Reads an msr value (of 'msr_index') into 'pdata'.
* Returns 0 on success, non-0 otherwise.
@@ -4443,6 +4448,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
+ .use_virtual_tsc_khz = vmx_use_virtual_tsc_khz,
.set_tdp_cr3 = vmx_set_cr3,
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8575d85..597abc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -979,6 +979,14 @@ static inline int kvm_tsc_changes_freq(void)
return ret;
}
+static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu)
+{
+ if (kvm_x86_ops->use_virtual_tsc_khz(vcpu))
+ return vcpu->kvm->arch.virtual_tsc_khz;
+ else
+ return __this_cpu_read(cpu_tsc_khz);
+}
+
static inline u64 nsec_to_cycles(u64 nsec)
{
u64 ret;
@@ -1010,6 +1018,7 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
return tsc;
}
+
void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
{
struct kvm *kvm = vcpu->kvm;
@@ -1072,8 +1081,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
local_irq_save(flags);
kvm_get_msr(v, MSR_IA32_TSC, &tsc_timestamp);
kernel_ns = get_kernel_ns();
- this_tsc_khz = __this_cpu_read(cpu_tsc_khz);
-
+ this_tsc_khz = vcpu_tsc_khz(v);
if (unlikely(this_tsc_khz == 0)) {
local_irq_restore(flags);
kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
--
1.7.1
With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 11 ++++++++++-
arch/x86/kvm/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 10 +++++-----
4 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9686950..8c40425 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -593,6 +593,7 @@ struct kvm_x86_ops {
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
bool (*use_virtual_tsc_khz)(struct kvm_vcpu *vcpu);
+ u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
const struct trace_print_flags *exit_reasons_str;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 29833a7..f938585 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -881,7 +881,6 @@ static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
static bool svm_vcpu_init_tsc(struct kvm *kvm, struct vcpu_svm *svm)
{
- u64 raw_tsc, tsc, new_tsc;
u64 ratio;
u64 khz;
@@ -941,6 +940,15 @@ static bool svm_use_virtual_tsc_khz(struct kvm_vcpu *vcpu)
return svm->tsc_scale.enabled;
}
+static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+ u64 tsc;
+
+ tsc = svm_scale_tsc(vcpu, native_read_tsc());
+
+ return target_tsc - tsc;
+}
+
static void init_vmcb(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4016,6 +4024,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
.use_virtual_tsc_khz = svm_use_virtual_tsc_khz,
+ .compute_tsc_offset = svm_compute_tsc_offset,
.set_tdp_cr3 = set_tdp_cr3,
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c227a6b..9bbdf1f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1169,6 +1169,11 @@ static bool vmx_use_virtual_tsc_khz(struct kvm_vcpu *vcpu)
return false;
}
+static u64 vmx_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+ return target_tsc - native_read_tsc();
+}
+
/*
* Reads an msr value (of 'msr_index') into 'pdata'.
* Returns 0 on success, non-0 otherwise.
@@ -4449,6 +4454,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
.use_virtual_tsc_khz = vmx_use_virtual_tsc_khz,
+ .compute_tsc_offset = vmx_compute_tsc_offset,
.set_tdp_cr3 = vmx_set_cr3,
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 597abc8..6caaf4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -987,7 +987,7 @@ static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu)
return __this_cpu_read(cpu_tsc_khz);
}
-static inline u64 nsec_to_cycles(u64 nsec)
+static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
{
u64 ret;
@@ -995,7 +995,7 @@ static inline u64 nsec_to_cycles(u64 nsec)
if (kvm_tsc_changes_freq())
printk_once(KERN_WARNING
"kvm: unreliable cycle conversion on adjustable rate TSC\n");
- ret = nsec * __this_cpu_read(cpu_tsc_khz);
+ ret = nsec * vcpu_tsc_khz(vcpu);
do_div(ret, USEC_PER_SEC);
return ret;
}
@@ -1027,7 +1027,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
s64 sdiff;
spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
- offset = data - native_read_tsc();
+ offset = kvm_x86_ops->compute_tsc_offset(vcpu, data);
ns = get_kernel_ns();
elapsed = ns - kvm->arch.last_tsc_nsec;
sdiff = data - kvm->arch.last_tsc_write;
@@ -1043,13 +1043,13 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
* In that case, for a reliable TSC, we can match TSC offsets,
* or make a best guest using elapsed value.
*/
- if (sdiff < nsec_to_cycles(5ULL * NSEC_PER_SEC) &&
+ if (sdiff < nsec_to_cycles(vcpu, 5ULL * NSEC_PER_SEC) &&
elapsed < 5ULL * NSEC_PER_SEC) {
if (!check_tsc_unstable()) {
offset = kvm->arch.last_tsc_offset;
pr_debug("kvm: matched tsc offset for %llu\n", data);
} else {
- u64 delta = nsec_to_cycles(elapsed);
+ u64 delta = nsec_to_cycles(vcpu, elapsed);
offset += delta;
pr_debug("kvm: adjusted tsc offset by %llu\n", delta);
}
--
1.7.1
This patch implements the propagation of the VM
virtual_tsc_khz into each vcpu data-structure to enable the
tsc-scaling feature.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kvm/svm.c | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f51f757..29833a7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -879,6 +879,35 @@ static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
return _tsc;
}
+static bool svm_vcpu_init_tsc(struct kvm *kvm, struct vcpu_svm *svm)
+{
+ u64 raw_tsc, tsc, new_tsc;
+ u64 ratio;
+ u64 khz;
+
+ /* TSC scaling supported? */
+ if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR))
+ goto out;
+
+ /* Guest tsc same frequency as host tsc? */
+ if (kvm->arch.virtual_tsc_khz == tsc_khz)
+ goto out;
+
+ khz = kvm->arch.virtual_tsc_khz;
+
+ /* TSC scaling required - calculate ratio */
+ ratio = khz << 32;
+ do_div(ratio, tsc_khz);
+ if (ratio == 0 || ratio & TSC_RATIO_RSVD)
+ return false;
+
+ svm->tsc_scale.ratio = ratio;
+ svm->tsc_scale.enabled = true;
+
+out:
+ return true;
+}
+
static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1084,6 +1113,9 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
if (err)
goto free_svm;
+ if (!svm_vcpu_init_tsc(kvm, svm))
+ goto uninit;
+
err = -ENOMEM;
page = alloc_page(GFP_KERNEL);
if (!page)
--
1.7.1
This patch enhances the kvm_amd module with functions to
support the TSC_RATE_MSR which can be used to set a given
tsc frequency for the guest vcpu.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/svm.c | 37 ++++++++++++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5bfafb6..fdac548 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -106,6 +106,7 @@
complete list. */
#define MSR_AMD64_PATCH_LEVEL 0x0000008b
+#define MSR_AMD64_TSC_RATIO 0xc0000104
#define MSR_AMD64_NB_CFG 0xc001001f
#define MSR_AMD64_PATCH_LOADER 0xc0010020
#define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bfb4948..c96c0a6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -63,6 +63,8 @@ MODULE_LICENSE("GPL");
#define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
+#define TSC_RATIO_RSVD 0xffffff0000000000ULL
+
static bool erratum_383_found __read_mostly;
static const u32 host_save_user_msrs[] = {
@@ -142,6 +144,12 @@ struct vcpu_svm {
unsigned int3_injected;
unsigned long int3_rip;
u32 apf_reason;
+
+ struct {
+ bool enabled;
+ u64 ratio;
+ } tsc_scale;
+
};
#define MSR_INVALID 0xffffffffU
@@ -852,6 +860,25 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type)
seg->base = 0;
}
+static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 _tsc = tsc;
+
+ if (svm->tsc_scale.enabled) {
+ u64 mult, frac;
+
+ mult = svm->tsc_scale.ratio >> 32;
+ frac = svm->tsc_scale.ratio & ((1ULL << 32) - 1);
+
+ _tsc *= mult;
+ _tsc += (tsc >> 32) * frac;
+ _tsc += ((tsc & ((1ULL << 32) - 1)) * frac) >> 32;
+ }
+
+ return _tsc;
+}
+
static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2808,7 +2835,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
case MSR_IA32_TSC: {
struct vmcb *vmcb = get_host_vmcb(svm);
- *data = vmcb->control.tsc_offset + native_read_tsc();
+ *data = vmcb->control.tsc_offset +
+ svm_scale_tsc(vcpu, native_read_tsc());
+
break;
}
case MSR_STAR:
@@ -3564,6 +3593,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
clgi();
+ if (static_cpu_has(X86_FEATURE_TSCRATEMSR) && svm->tsc_scale.enabled)
+ wrmsrl(MSR_AMD64_TSC_RATIO, svm->tsc_scale.ratio);
+
local_irq_enable();
asm volatile (
@@ -3647,6 +3679,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
local_irq_disable();
+ if (static_cpu_has(X86_FEATURE_TSCRATEMSR) && svm->tsc_scale.enabled)
+ wrmsr(MSR_AMD64_TSC_RATIO, 0, 1);
+
vcpu->arch.cr2 = svm->vmcb->save.cr2;
vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
--
1.7.1
On 02/09/2011 12:29 PM, Joerg Roedel wrote:
> With TSC scaling in SVM the tsc-offset needs to be
> calculated differently. This patch propagates this
> calculation into the architecture specific modules so that
> this complexity can be handled there.
>
> Signed-off-by: Joerg Roedel<[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 11 ++++++++++-
> arch/x86/kvm/vmx.c | 6 ++++++
> arch/x86/kvm/x86.c | 10 +++++-----
> 4 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9686950..8c40425 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -593,6 +593,7 @@ struct kvm_x86_ops {
> void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
>
> bool (*use_virtual_tsc_khz)(struct kvm_vcpu *vcpu);
> + u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
>
So I've gone over this series and the only issue I see so far is with
this patch, and it doesn't have to do with upstream code, rather with
code I was about to send.
Logically, the compensation done by adjust_tsc_offset should also be
scaled; currently, this happens only for reasons, both of which are
meant to deal with unstable TSCs; since TSC scaling won't happen on
those processors with unstable TSCs, we don't need to worry about it there.
I have an upcoming patch which does complicate things a bit, which deals
with host suspend. In that case, the host TSC goes backwards and the
offsets needs to be recomputed. However, there is no convenient time to
set the offset again; on VMX, the hardware will not yet be set up and so
can't easily write the offset field in the VMCS. We also can't put a
synchronization barrier on all the VCPUs to write the offset before they
start running without getting into a difficult situation with locking.
So instead, the approach I took was to re-use the adjust_tsc_offset
function and accumulate offsets to apply.
For SVM with TSC scaling, the offset to apply as an adjustment in this
case needs to be scaled. Setting guest TSC (gtsc) equal to the new
guest TSC (gstc'), we have:
gtsc = htsc * mult + offset
gstc' = htsc' * mult + offset'
gtsc' = gtsc
offset' = htsc * mult + offset - htsc' * mult
offset' = (htsc - htsc') * mult + offset
so, delta offset needs to = (htsc - htsc') * mult
We will instead be passing (htsc - htsc') as the adjustment value; the
solution seems simple, we have to scale it up as well in the
adjust_tsc_offset function.
However, the problem is that we need a new architecture specific
function or API change because not all call sites for adjust_tsc want to
have adjustments scaled - the call site dealing with tsc_catchup is
actually working in guest cycles already, so should not be scaled again.
We could have separate functions to adjust TSC cycles by either guest or
host cycle amounts, or pass a flag to adjust_tsc_offset indicating
whether the adjustment is to be applied in guest cycles or host cycles.
The resulting API will be slightly asymmetric, as compute_tsc_offset
lets the generic code compute in terms of hardware offsets, but in the
adjustment case, there isn't an easy way to expose the ability to
compute in hardware offset terms.
One slight pity is that we won't be able to resue
svm_compute_tsc_offset, as the applied delta won't be based off a read
of the tsc. I can't really find a better API though, in case offsets
are computed differently on different hardware (such as multiplying
after the offset), then we need a function to convert guest cycles back
to hardware cycles.
As usual, with the TSC code, it is going to require a lot of commenting
to explain this.
Your code in general looks good.
Cheers,
Zach
On 02/09/2011 07:29 PM, Joerg Roedel wrote:
> This patch implements two new vm-ioctls to get and set the
> virtual_tsc_khz if the machine supports tsc-scaling. Setting
> the tsc-frequency is only possible before userspace creates
> any vcpu.
>
> Signed-off-by: Joerg Roedel<[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 3 +++
> arch/x86/kvm/x86.c | 38 ++++++++++++++++++++++++++++++++++++++
> include/linux/kvm.h | 4 ++++
> 4 files changed, 46 insertions(+), 0 deletions(-)
>
Documentation/kvm/api.txt +++++++++++++
> @@ -633,6 +633,7 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
> u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
>
> extern bool tdp_enabled;
> +extern int kvm_has_tsc_control;
bool
> + case KVM_SET_TSC_KHZ: {
> + u32 user_tsc_khz;
> +
> + if (!kvm_has_tsc_control)
> + break;
> +
> + /*
> + * We force the tsc frequency to be set before any
> + * vcpu is created
> + */
> + if (atomic_read(&kvm->online_vcpus)> 0)
> + goto out;
What if a vcpu is created here? No locking AFAICS.
> +
> + user_tsc_khz = arg;
> +
> + kvm_arch_set_tsc_khz(kvm, user_tsc_khz);
> +
Error check for impossible values (0, values the tsc multiplier can't
reach)?
> + r = 0;
> + goto out;
> + }
> + case KVM_GET_TSC_KHZ: {
> +
> + if (!kvm_has_tsc_control)
> + break;
> +
> + r = -EFAULT;
> + if (copy_to_user(argp,&kvm->arch.virtual_tsc_khz, sizeof(__u32)))
> + goto out;
Should be the return value, no?
> +
> + r = 0;
> + goto out;
> + }
>
> default:
> ;
>
>
> @@ -677,6 +678,9 @@ struct kvm_clock_data {
> #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
> /* Available with KVM_CAP_PPC_GET_PVINFO */
> #define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo)
> +/* Available with KVM_CAP_TSC_CONTROL */
> +#define KVM_SET_TSC_KHZ _IOW(KVMIO, 0xa2, __u32)
> +#define KVM_GET_TSC_KHZ _IOR(KVMIO, 0xa3, __u32)
>
_IO() - use arg or return value
_IOW/_IOR - copy_to/from_user()
pick one, but don't mix.
--
error compiling committee.c: too many arguments to function
On 02/09/2011 07:29 PM, Joerg Roedel wrote:
> Hi Avi, Marcelo,
>
> here is the patch-set to implement the TSC-scaling feature of upcoming
> AMD CPUs. When this feature is supported the CPU provides a new MSR
> which holds a multiplier for the hardware TSC which is applied on the
> value rdtsc[p] and reads of MSR 0x10. This feature can be used to
> emulate a given tsc frequency for the guest.
> Patch 1 is not directly related to this patch-set because it only fixes
> a bug which prevented me from testing these patches. In fact it fixes
> the same bug Andre sent a patch for. But after the discussion about his
> patch he told me to just post my patch and thus here it is.
>
Questions:
- the tsc multiplier really is a multiplier, right? Not an addend that
is added every cycle.
So
wrmsr(TSC, 1e9)
wrmsr(TSC_MULT, 2.0000)
t = rdtsc()
will return about 2e9, not 1e9 + 2*(time to execute the code snippet) ?
- what's the cost of wrmsr(TSC_MULT)?
There are really two ways to implement this feature. One is fully
generic, like you did. The other is to implement it at the host level -
have a sysfs file and/or kernel parameter for the desired tsc frequency,
write it once, and forget about it. Trust management to set the host
tsc frequency to the same value on all hosts in a migration cluster.
The advantages of the the simpler implementation are, well, that it's
simpler, and second that it avoids two wrmsrs per exit. We could
combine both implementations, and have
if (guest_mult != host_mult)
wrmsr(TSC_MULT, guest_mult)
etc. But I'd like to understand if there are additional motivations for
per-guest tsc frequency.
--
error compiling committee.c: too many arguments to function
(Sorry for the delay, I had to spend some days sick at home :-( )
On Fri, Feb 11, 2011 at 05:12:29PM -0500, Zachary Amsden wrote:
> On 02/09/2011 12:29 PM, Joerg Roedel wrote:
> So I've gone over this series and the only issue I see so far is with
> this patch, and it doesn't have to do with upstream code, rather with
> code I was about to send.
>
> Logically, the compensation done by adjust_tsc_offset should also be
> scaled; currently, this happens only for reasons, both of which are
> meant to deal with unstable TSCs; since TSC scaling won't happen on
> those processors with unstable TSCs, we don't need to worry about it there.
The tsc_offset is applied after the TSC is scaled so there is no good
way to scale the offset with the TSC value itself.
What we can do is to use guest-tsc values only when we calculate an
adjustment. So any tsc-offset adjustment made with adjust_tsc_offset()
needs to be a function of guest-tsc values. One call-place of the
function already does this and the other one can be converted easily.
I'll do that in the next version of this patch-set.
>From what I understand of your upcoming patch the accumulation of
tsc-offsets could also be calculated from guest-tsc values instead of
native_read_tsc() values, no?
Regards,
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Sun, Feb 13, 2011 at 10:12:14AM -0500, Avi Kivity wrote:
> On 02/09/2011 07:29 PM, Joerg Roedel wrote:
> > This patch implements two new vm-ioctls to get and set the
> > virtual_tsc_khz if the machine supports tsc-scaling. Setting
> > the tsc-frequency is only possible before userspace creates
> > any vcpu.
> >
> > Signed-off-by: Joerg Roedel<[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 1 +
> > arch/x86/kvm/svm.c | 3 +++
> > arch/x86/kvm/x86.c | 38 ++++++++++++++++++++++++++++++++++++++
> > include/linux/kvm.h | 4 ++++
> > 4 files changed, 46 insertions(+), 0 deletions(-)
> >
>
> Documentation/kvm/api.txt +++++++++++++
>
> > @@ -633,6 +633,7 @@ int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
> > u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
> >
> > extern bool tdp_enabled;
> > +extern int kvm_has_tsc_control;
>
> bool
>
>
> > + case KVM_SET_TSC_KHZ: {
> > + u32 user_tsc_khz;
> > +
> > + if (!kvm_has_tsc_control)
> > + break;
> > +
> > + /*
> > + * We force the tsc frequency to be set before any
> > + * vcpu is created
> > + */
> > + if (atomic_read(&kvm->online_vcpus)> 0)
> > + goto out;
>
> What if a vcpu is created here? No locking AFAICS.
>
> > +
> > + user_tsc_khz = arg;
> > +
> > + kvm_arch_set_tsc_khz(kvm, user_tsc_khz);
> > +
>
> Error check for impossible values (0, values the tsc multiplier can't
> reach)?
>
> > + r = 0;
> > + goto out;
> > + }
> > + case KVM_GET_TSC_KHZ: {
> > +
> > + if (!kvm_has_tsc_control)
> > + break;
> > +
> > + r = -EFAULT;
> > + if (copy_to_user(argp,&kvm->arch.virtual_tsc_khz, sizeof(__u32)))
> > + goto out;
>
> Should be the return value, no?
>
> > +
> > + r = 0;
> > + goto out;
> > + }
> >
> > default:
> > ;
> >
> >
> > @@ -677,6 +678,9 @@ struct kvm_clock_data {
> > #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
> > /* Available with KVM_CAP_PPC_GET_PVINFO */
> > #define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo)
> > +/* Available with KVM_CAP_TSC_CONTROL */
> > +#define KVM_SET_TSC_KHZ _IOW(KVMIO, 0xa2, __u32)
> > +#define KVM_GET_TSC_KHZ _IOR(KVMIO, 0xa3, __u32)
> >
>
> _IO() - use arg or return value
> _IOW/_IOR - copy_to/from_user()
>
> pick one, but don't mix.
Thanks, I'll fix these issued in the next version.
Regards,
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Sun, Feb 13, 2011 at 10:19:19AM -0500, Avi Kivity wrote:
> On 02/09/2011 07:29 PM, Joerg Roedel wrote:
> > Hi Avi, Marcelo,
> >
> > here is the patch-set to implement the TSC-scaling feature of upcoming
> > AMD CPUs. When this feature is supported the CPU provides a new MSR
> > which holds a multiplier for the hardware TSC which is applied on the
> > value rdtsc[p] and reads of MSR 0x10. This feature can be used to
> > emulate a given tsc frequency for the guest.
> > Patch 1 is not directly related to this patch-set because it only fixes
> > a bug which prevented me from testing these patches. In fact it fixes
> > the same bug Andre sent a patch for. But after the discussion about his
> > patch he told me to just post my patch and thus here it is.
> >
>
> Questions:
> - the tsc multiplier really is a multiplier, right? Not an addend that
> is added every cycle.
Yes, it is a real multiplier. But writes to the TSC-MSR will change the
unscaled TSC value.
>
> So
>
> wrmsr(TSC, 1e9)
> wrmsr(TSC_MULT, 2.0000)
> t = rdtsc()
>
> will return about 2e9, not 1e9 + 2*(time to execute the code snippet) ?
Right. And if you exchange the two wrmsr calls it will still give you
the same result.
> - what's the cost of wrmsr(TSC_MULT)?
Hard to tell by now because I only have numbers for pre-production
hardware.
> There are really two ways to implement this feature. One is fully
> generic, like you did. The other is to implement it at the host level -
> have a sysfs file and/or kernel parameter for the desired tsc frequency,
> write it once, and forget about it. Trust management to set the host
> tsc frequency to the same value on all hosts in a migration cluster.
The motivation here is mostly the flexibility. Scale the TSC for the
whole migration cluster only makes sense if all hosts there support the
feature. But the most likely scenario is that existing migration
clusters will be extended by new machines and guests will be migrated
there. And these guests should be able to see the same TSC frequency on
the new host as the had on the old one. The older machines in the
cluster may even have different TSC frequencys. With this flexible
implementation those scenarios are possible. A host-wide setting for the
scaling will make the feature useless in those (common) scenarios.
Regards,
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On 02/21/2011 12:28 PM, Roedel, Joerg wrote:
> On Sun, Feb 13, 2011 at 10:19:19AM -0500, Avi Kivity wrote:
>
>> On 02/09/2011 07:29 PM, Joerg Roedel wrote:
>>
>>> Hi Avi, Marcelo,
>>>
>>> here is the patch-set to implement the TSC-scaling feature of upcoming
>>> AMD CPUs. When this feature is supported the CPU provides a new MSR
>>> which holds a multiplier for the hardware TSC which is applied on the
>>> value rdtsc[p] and reads of MSR 0x10. This feature can be used to
>>> emulate a given tsc frequency for the guest.
>>> Patch 1 is not directly related to this patch-set because it only fixes
>>> a bug which prevented me from testing these patches. In fact it fixes
>>> the same bug Andre sent a patch for. But after the discussion about his
>>> patch he told me to just post my patch and thus here it is.
>>>
>>>
>> Questions:
>> - the tsc multiplier really is a multiplier, right? Not an addend that
>> is added every cycle.
>>
> Yes, it is a real multiplier. But writes to the TSC-MSR will change the
> unscaled TSC value.
>
>
>> So
>>
>> wrmsr(TSC, 1e9)
>> wrmsr(TSC_MULT, 2.0000)
>> t = rdtsc()
>>
>> will return about 2e9, not 1e9 + 2*(time to execute the code snippet) ?
>>
> Right. And if you exchange the two wrmsr calls it will still give you
> the same result.
>
>
>> - what's the cost of wrmsr(TSC_MULT)?
>>
> Hard to tell by now because I only have numbers for pre-production
> hardware.
>
>
>> There are really two ways to implement this feature. One is fully
>> generic, like you did. The other is to implement it at the host level -
>> have a sysfs file and/or kernel parameter for the desired tsc frequency,
>> write it once, and forget about it. Trust management to set the host
>> tsc frequency to the same value on all hosts in a migration cluster.
>>
> The motivation here is mostly the flexibility. Scale the TSC for the
> whole migration cluster only makes sense if all hosts there support the
> feature. But the most likely scenario is that existing migration
> clusters will be extended by new machines and guests will be migrated
> there. And these guests should be able to see the same TSC frequency on
> the new host as the had on the old one. The older machines in the
> cluster may even have different TSC frequencys. With this flexible
> implementation those scenarios are possible. A host-wide setting for the
> scaling will make the feature useless in those (common) scenarios.
>
It's also possible to scale the TSCs of the cluster to be matching
outside of the framework of KVM. In that case, the VCPU client (qemu)
simply needs to be smart enough to not request the TSC rate be scaled.
That approach is completely compatible with this implementation.
If you do indeed want to have mixed speed VMs running on a single host,
that can also be done with the approach here.
Combining the two - supporting a standard cluster rate via host scaling,
plus a variable rate for martian VMs (those not conforming to the
standard cluster rate) would require some more work, as the multiplier
written back on exit from a martian would not be 1.0, rather something
else. Everything else should work as long as tsc_khz still expresses
the natural rate of the TSC, even when scaled to a standard cluster
rate. In that case, you can also pursue Avi's suggestion of skipping
the MSR loads for VMs where the rate matches the host rate.
Adding an export to the kernel indicating the currently applied scaling
rate may not be a bad idea if you want to support such an implementation
in the future.
I did have one slight concern about scaling in general. What happens
when the CPU khz rate is not uniformly detected across machines or
clusters? In general, it does vary a bit, I see differences out to the
5th digit of precision on the same machine. This is close enough to be
within the range of NTP correction (500 ppm), but also small enough to
represent real clock differences (and of course, there is some
measurement error).
If you are within the threshold where NTP can correct the time, you may
not want to apply a multiplier to the TSC at all. Again, this decision
can be made in the userspace component, but it's an important
consideration to bring up for the qemu patches that will be required to
support this.
Zach
On 02/21/2011 07:28 PM, Roedel, Joerg wrote:
> > - what's the cost of wrmsr(TSC_MULT)?
>
> Hard to tell by now because I only have numbers for pre-production
> hardware.
Can you ask your hardware people what the cost will likely be? msrs are
often expensive, and here we have two in the lightweight exit path.
> > There are really two ways to implement this feature. One is fully
> > generic, like you did. The other is to implement it at the host level -
> > have a sysfs file and/or kernel parameter for the desired tsc frequency,
> > write it once, and forget about it. Trust management to set the host
> > tsc frequency to the same value on all hosts in a migration cluster.
>
> The motivation here is mostly the flexibility. Scale the TSC for the
> whole migration cluster only makes sense if all hosts there support the
> feature. But the most likely scenario is that existing migration
> clusters will be extended by new machines and guests will be migrated
> there. And these guests should be able to see the same TSC frequency on
> the new host as the had on the old one. The older machines in the
> cluster may even have different TSC frequencys. With this flexible
> implementation those scenarios are possible. A host-wide setting for the
> scaling will make the feature useless in those (common) scenarios.
This doesn't really work, since we don't know on what host the TSC
calibration loop ran:
- start guest on host H1
- migrate it around, now it's on host H2
- guest reboots, reruns calibration loop
- migrate it around some more, now it's on host H3
- migrate to host with tsc multiplier Hnew
So, what should we set the multiplier to? H1, H2, or H3's tsc rate?
--
error compiling committee.c: too many arguments to function
On 02/22/2011 12:35 PM, Roedel, Joerg wrote:
> > This doesn't really work, since we don't know on what host the TSC
> > calibration loop ran:
> >
> > - start guest on host H1
> > - migrate it around, now it's on host H2
> > - guest reboots, reruns calibration loop
> > - migrate it around some more, now it's on host H3
> > - migrate to host with tsc multiplier Hnew
> >
> > So, what should we set the multiplier to? H1, H2, or H3's tsc rate?
>
> This scenario doesn't matter. If the guest already detected its TSC to
> be unstable there is nothing we can do and it doesn't really matter what
> we set the tsc frequency to. Therefore software will always set the
> guest tsc frequency to the same value it had on the last host.
Ok, so your scenario is
- boot on host H1
- no intervening migrations
- migrate to host Hnew
- all succeeding migrations are only to new hosts or back to H1
This is somewhat artificial, and not very different from an all-new cluster.
[the whole thing is kind of sad; we went through a huge effort to make
clocks work on virtual machines in spite of the tsc issues; then we have
a hardware solution, but can't use it because of old hardware. Same
thing happens with the effort put into shadow in the pre-npt days]
--
error compiling committee.c: too many arguments to function
On Tue, Feb 22, 2011 at 05:11:42AM -0500, Avi Kivity wrote:
> On 02/21/2011 07:28 PM, Roedel, Joerg wrote:
> > > - what's the cost of wrmsr(TSC_MULT)?
> >
> > Hard to tell by now because I only have numbers for pre-production
> > hardware.
>
> Can you ask your hardware people what the cost will likely be? msrs are
> often expensive, and here we have two in the lightweight exit path.
Will do.
> This doesn't really work, since we don't know on what host the TSC
> calibration loop ran:
>
> - start guest on host H1
> - migrate it around, now it's on host H2
> - guest reboots, reruns calibration loop
> - migrate it around some more, now it's on host H3
> - migrate to host with tsc multiplier Hnew
>
> So, what should we set the multiplier to? H1, H2, or H3's tsc rate?
This scenario doesn't matter. If the guest already detected its TSC to
be unstable there is nothing we can do and it doesn't really matter what
we set the tsc frequency to. Therefore software will always set the
guest tsc frequency to the same value it had on the last host.
In the above scenario this would be be the host tsc frequency of H3. If
the guest is migrated further around from the host with TSC multiplier
this frequency is passed on further. Software can read the guest tsc
frequency using the ioctl.
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Tue, Feb 22, 2011 at 05:41:53AM -0500, Avi Kivity wrote:
> On 02/22/2011 12:35 PM, Roedel, Joerg wrote:
> > > This doesn't really work, since we don't know on what host the TSC
> > > calibration loop ran:
> > >
> > > - start guest on host H1
> > > - migrate it around, now it's on host H2
> > > - guest reboots, reruns calibration loop
> > > - migrate it around some more, now it's on host H3
> > > - migrate to host with tsc multiplier Hnew
> > >
> > > So, what should we set the multiplier to? H1, H2, or H3's tsc rate?
> >
> > This scenario doesn't matter. If the guest already detected its TSC to
> > be unstable there is nothing we can do and it doesn't really matter what
> > we set the tsc frequency to. Therefore software will always set the
> > guest tsc frequency to the same value it had on the last host.
>
> Ok, so your scenario is
>
> - boot on host H1
> - no intervening migrations
> - migrate to host Hnew
> - all succeeding migrations are only to new hosts or back to H1
>
> This is somewhat artificial, and not very different from an all-new cluster.
This is at least the scenario where the new hardware feature will make
sense. Its clear that if you migrate a guest between hosts without
tsc-scaling will make the tsc appear unstable for the guest. This is
basically the same situation as we have today.
In fact, for older hosts the feature can be emulated in software by
trapping tsc accesses from the guest. Isn't this what Zachary has been
working on? During my implementation I understood tsc-scaling as a
hardware supported way to do this. And thats the reason I implemented it
the way it is.
> [the whole thing is kind of sad; we went through a huge effort to make
> clocks work on virtual machines in spite of the tsc issues; then we have
> a hardware solution, but can't use it because of old hardware. Same
> thing happens with the effort put into shadow in the pre-npt days]
The shadow code has a revivial as it is required for emulating
nested-npt and nested-ept, so the effort still has value :)
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On Wed, Feb 09, 2011 at 12:29:39PM -0500, Joerg Roedel wrote:
> In the dr_intercept function a new cpu-feature called
> decode-assists is implemented and used when available. This
> code-path does not advance the guest-rip causing the guest
> to dead-loop over mov-dr instructions. This is fixed by this
> patch.
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/kvm/svm.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 73a8f1d..bfb4948 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2777,6 +2777,8 @@ static int dr_interception(struct vcpu_svm *svm)
> kvm_register_write(&svm->vcpu, reg, val);
> }
>
> + skip_emulated_instruction(&svm->vcpu);
> +
> return 1;
> }
Btw. Can you meanwhile apply this patch? It fixes a bug which sends the
guest into an endless loop when decode assists is available on the host.
Joerg
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
On 02/22/2011 01:14 PM, Roedel, Joerg wrote:
> On Wed, Feb 09, 2011 at 12:29:39PM -0500, Joerg Roedel wrote:
> > In the dr_intercept function a new cpu-feature called
> > decode-assists is implemented and used when available. This
> > code-path does not advance the guest-rip causing the guest
> > to dead-loop over mov-dr instructions. This is fixed by this
> > patch.
> >
>
> Btw. Can you meanwhile apply this patch? It fixes a bug which sends the
> guest into an endless loop when decode assists is available on the host.
Yes of course - should have done that myself. Applied now, and queued
for 2.6.38.
--
error compiling committee.c: too many arguments to function
On 02/22/2011 01:11 PM, Roedel, Joerg wrote:
> >
> > Ok, so your scenario is
> >
> > - boot on host H1
> > - no intervening migrations
> > - migrate to host Hnew
> > - all succeeding migrations are only to new hosts or back to H1
> >
> > This is somewhat artificial, and not very different from an all-new cluster.
>
> This is at least the scenario where the new hardware feature will make
> sense. Its clear that if you migrate a guest between hosts without
> tsc-scaling will make the tsc appear unstable for the guest. This is
> basically the same situation as we have today.
> In fact, for older hosts the feature can be emulated in software by
> trapping tsc accesses from the guest. Isn't this what Zachary has been
> working on?
Yes. It's of dubious value though, you get a stable tsc but it's
incredibly slow.
> During my implementation I understood tsc-scaling as a
> hardware supported way to do this. And thats the reason I implemented it
> the way it is.
Right. The only question is what the added guest switch cost. If it's
expensive (say, >= 100 cycles) then we need a mode where we can drop
this cost by applying the same multiplier to all guests and the host
(can be done as an add-on optimization patch). If however we end up
always recommending that all hosts use the same virtual tsc rate, why
should we support individual rates for guests?
It does make sense from a generality point of view, we provide
mechanism, not policy, just make sure that the policies we like are
optimized as far as they can go.
> > [the whole thing is kind of sad; we went through a huge effort to make
> > clocks work on virtual machines in spite of the tsc issues; then we have
> > a hardware solution, but can't use it because of old hardware. Same
> > thing happens with the effort put into shadow in the pre-npt days]
>
> The shadow code has a revivial as it is required for emulating
> nested-npt and nested-ept, so the effort still has value :)
Yes. Some of it though is unused (unsync pages). And it's hard for me
to see nested svm itself used in production due to the huge performance
hit for I/O. Maybe an emulated iommu (so we can do virtio device
assignment, or even real device assignment all the way from the host)
will help, or even more hardware support a la s390.
--
error compiling committee.c: too many arguments to function
On Tue, Feb 22, 2011 at 09:01:51AM -0500, Avi Kivity wrote:
> On 02/22/2011 01:14 PM, Roedel, Joerg wrote:
> > On Wed, Feb 09, 2011 at 12:29:39PM -0500, Joerg Roedel wrote:
> > > In the dr_intercept function a new cpu-feature called
> > > decode-assists is implemented and used when available. This
> > > code-path does not advance the guest-rip causing the guest
> > > to dead-loop over mov-dr instructions. This is fixed by this
> > > patch.
> > >
> >
> > Btw. Can you meanwhile apply this patch? It fixes a bug which sends the
> > guest into an endless loop when decode assists is available on the host.
>
> Yes of course - should have done that myself. Applied now, and queued
> for 2.6.38.
Great, thanks.
--
AMD Operating System Research Center
Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632