2012-10-17 02:24:13

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat. This can
cause confusion for the end user. To ease the confusion this patch set
adds the idea of consigned (expected steal) time. The host will separate
the consigned time from the steal time. The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time. Any other steal time accruing during that period will show as the
traditional steal time.

TODO:
* Change native_clock to take params and not return a value
* Change update_rq_clock_task

Changes from V1:
* Removed the steal time allowed percentage from the guest
* Moved the separation of consigned (expected steal) and steal time to the
host.
* No longer include a sysctl interface.

---

Michael Wolf (5):
Alter the amount of steal time reported by the guest.
Expand the steal time msr to also contain the consigned time.
Add the code to send the consigned time from the host to the guest
Add a timer to allow the separation of consigned from steal time.
Add an ioctl to communicate the consign limit to the host.


arch/x86/include/asm/kvm_host.h | 11 +++++++++
arch/x86/include/asm/kvm_para.h | 3 ++
arch/x86/include/asm/paravirt.h | 4 ++-
arch/x86/kernel/kvm.c | 8 ++----
arch/x86/kvm/x86.c | 50 ++++++++++++++++++++++++++++++++++++++-
fs/proc/stat.c | 9 +++++--
include/linux/kernel_stat.h | 2 ++
include/linux/kvm.h | 2 ++
include/linux/kvm_host.h | 2 ++
kernel/sched/cputime.c | 21 +++++++++++++++-
kernel/sched/sched.h | 2 ++
virt/kvm/kvm_main.c | 7 +++++
12 files changed, 108 insertions(+), 13 deletions(-)


2012-10-17 02:24:19

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 1/5] Alter the amount of steal time reported by the guest.

Modify the amount of stealtime that the kernel reports via the /proc interface.
Steal time will now be broken down into steal_time and consigned_time.
Consigned_time will represent the amount of time that is expected to be lost
due to overcommitment of the physical cpu or by using cpu capping. The amount
consigned_time will be passed in using an ioctl. The time will be expressed in
the number of nanoseconds to be lost in during the fixed period. The fixed period
is currently 1/10th of a second.

Signed-off-by: Michael Wolf <[email protected]>
---
fs/proc/stat.c | 9 +++++++--
include/linux/kernel_stat.h | 1 +
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index e296572..cb7fe80 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -82,7 +82,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
- u64 guest, guest_nice;
+ u64 guest, guest_nice, consign;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -90,10 +90,11 @@ static int show_stat(struct seq_file *p, void *v)

user = nice = system = idle = iowait =
irq = softirq = steal = 0;
- guest = guest_nice = 0;
+ guest = guest_nice = consign = 0;
getboottime(&boottime);
jif = boottime.tv_sec;

+
for_each_possible_cpu(i) {
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -105,6 +106,7 @@ static int show_stat(struct seq_file *p, void *v)
steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+ consign += kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
sum += kstat_cpu_irqs_sum(i);
sum += arch_irq_stat_cpu(i);

@@ -128,6 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+ seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');

for_each_online_cpu(i) {
@@ -142,6 +145,7 @@ static int show_stat(struct seq_file *p, void *v)
steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+ consign = kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
@@ -153,6 +157,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+ seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 36d12f0..c0b0095 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -27,6 +27,7 @@ enum cpu_usage_stat {
CPUTIME_STEAL,
CPUTIME_GUEST,
CPUTIME_GUEST_NICE,
+ CPUTIME_CONSIGN,
NR_STATS,
};

2012-10-17 02:24:31

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 2/5] Expand the steal time msr to also contain the consigned time.

Add a consigned field. This field will hold the time lost due to capping or overcommit.
The rest of the time will still show up in the steal-time field.

Signed-off-by: Michael Wolf <[email protected]>
---
arch/x86/include/asm/paravirt.h | 4 ++--
arch/x86/kernel/kvm.c | 7 ++-----
kernel/sched/cputime.c | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..a5f9f30 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
extern struct static_key paravirt_steal_enabled;
extern struct static_key paravirt_steal_rq_enabled;

-static inline u64 paravirt_steal_clock(int cpu)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
{
- return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+ PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
}

static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index c1d61ee..91b3b2a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -368,9 +368,8 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
};

-static u64 kvm_steal_clock(int cpu)
+static u64 kvm_steal_clock(int cpu, u64 *steal)
{
- u64 steal;
struct kvm_steal_time *src;
int version;

@@ -378,11 +377,9 @@ static u64 kvm_steal_clock(int cpu)
do {
version = src->version;
rmb();
- steal = src->steal;
+ *steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
-
- return steal;
}

void kvm_disable_steal_time(void)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 81b763b..dd3fd46 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -275,7 +275,7 @@ static __always_inline bool steal_account_process_tick(void)
if (static_key_false(&paravirt_steal_enabled)) {
u64 steal, st = 0;

- steal = paravirt_steal_clock(smp_processor_id());
+ paravirt_steal_clock(smp_processor_id(), &steal);
steal -= this_rq()->prev_steal_time;

st = steal_ticks(steal);

2012-10-17 02:24:42

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 3/5] Add the code to send the consigned time from the host to the guest

Add the code to send the consigned time from the host to the guest

Signed-off-by: Michael Wolf <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/kvm_para.h | 3 ++-
arch/x86/include/asm/paravirt.h | 4 ++--
arch/x86/kernel/kvm.c | 3 ++-
arch/x86/kvm/x86.c | 2 ++
include/linux/kernel_stat.h | 1 +
kernel/sched/cputime.c | 21 +++++++++++++++++++--
kernel/sched/sched.h | 2 ++
8 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1eaa6b0..bd4e412 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -409,6 +409,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
+ u64 accum_consigned;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2f7712e..debe72e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -42,9 +42,10 @@

struct kvm_steal_time {
__u64 steal;
+ __u64 consigned;
__u32 version;
__u32 flags;
- __u32 pad[12];
+ __u32 pad[10];
};

#define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a5f9f30..d39e8d0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
extern struct static_key paravirt_steal_enabled;
extern struct static_key paravirt_steal_rq_enabled;

-static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
{
- PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
+ PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
}

static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 91b3b2a..4e5582a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -368,7 +368,7 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
};

-static u64 kvm_steal_clock(int cpu, u64 *steal)
+static u64 kvm_steal_clock(int cpu, u64 *steal, u64 *consigned)
{
struct kvm_steal_time *src;
int version;
@@ -378,6 +378,7 @@ static u64 kvm_steal_clock(int cpu, u64 *steal)
version = src->version;
rmb();
*steal = src->steal;
+ *consigned = src->consigned;
rmb();
} while ((version & 1) || (version != src->version));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f09552..801cfa8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1554,8 +1554,10 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
return;

vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
+ vcpu->arch.st.steal.consigned += vcpu->arch.st.accum_consigned;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
+ vcpu->arch.st.accum_consigned = 0;

kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index c0b0095..253fdce 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -125,6 +125,7 @@ extern unsigned long long task_delta_exec(struct task_struct *);
extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
extern void account_steal_time(cputime_t);
+extern void account_consigned_time(cputime_t);
extern void account_idle_time(cputime_t);

extern void account_process_tick(struct task_struct *, int user);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index dd3fd46..bf2025a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -244,6 +244,18 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
}

/*
+ * This accounts for the time that is split out of steal time.
+ * Consigned time represents the amount of time that the cpu was
+ * expected to be somewhere else.
+ */
+void account_consigned_time(cputime_t cputime)
+{
+ u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+ cpustat[CPUTIME_CONSIGN] += (__force u64) cputime;
+}
+
+/*
* Account for involuntary wait time.
* @cputime: the cpu time spent in involuntary wait
*/
@@ -274,15 +286,20 @@ static __always_inline bool steal_account_process_tick(void)
#ifdef CONFIG_PARAVIRT
if (static_key_false(&paravirt_steal_enabled)) {
u64 steal, st = 0;
+ u64 consigned, cs = 0;

- paravirt_steal_clock(smp_processor_id(), &steal);
+ paravirt_steal_clock(smp_processor_id(), &steal, &consigned);
steal -= this_rq()->prev_steal_time;
+ consigned -= this_rq()->prev_consigned_time;

st = steal_ticks(steal);
+ cs = steal_ticks(consigned);
this_rq()->prev_steal_time += st * TICK_NSEC;
+ this_rq()->prev_consigned_time += cs * TICK_NSEC;

account_steal_time(st);
- return st;
+ account_consigned_time(cs);
+ return st || cs;
}
#endif
return false;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3060136..64e4cf9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -436,9 +436,11 @@ struct rq {
#endif
#ifdef CONFIG_PARAVIRT
u64 prev_steal_time;
+ u64 prev_consigned_time;
#endif
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
u64 prev_steal_time_rq;
+ u64 prev_consigned_time_rq;
#endif

/* calc_load related fields */

2012-10-17 02:24:57

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 4/5] Add a timer to allow the separation of consigned from steal time.

Add a timer to the host. This will define the period. During a period
the first n ticks will go into the consigned bucket. Any other ticks that
occur within the period will be placed in the stealtime bucket.

Signed-off-by: Michael Wolf <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 10 +++++++++
arch/x86/include/asm/paravirt.h | 2 +-
arch/x86/kvm/x86.c | 42 ++++++++++++++++++++++++++++++++++++++-
3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bd4e412..d700850 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -41,6 +41,8 @@
#define KVM_PIO_PAGE_OFFSET 1
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2

+#define KVM_STEAL_TIMER_DELAY 100000000UL
+
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
| X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
@@ -339,6 +341,14 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;

/*
+ * timer used to determine if the time should be counted as
+ * steal time or consigned time.
+ */
+ struct hrtimer steal_timer;
+ u64 current_consigned;
+ u64 consigned_limit;
+
+ /*
* Paging state of the vcpu
*
* If the vcpu runs in guest mode with two level paging this still saves
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d39e8d0..6db79f9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,7 +196,7 @@ struct static_key;
extern struct static_key paravirt_steal_enabled;
extern struct static_key paravirt_steal_rq_enabled;

-static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
+static inline void paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
{
PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 801cfa8..469e748 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1535,13 +1535,32 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
static void accumulate_steal_time(struct kvm_vcpu *vcpu)
{
u64 delta;
+ u64 steal_delta;
+ u64 consigned_delta;

if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;

delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
- vcpu->arch.st.accum_steal = delta;
+
+ /* split the delta into steal and consigned */
+ if (vcpu->arch.current_consigned < vcpu->arch.consigned_limit) {
+ vcpu->arch.current_consigned += delta;
+ if (vcpu->arch.current_consigned > vcpu->arch.consigned_limit) {
+ steal_delta = vcpu->arch.current_consigned
+ - vcpu->arch.consigned_limit;
+ consigned_delta = delta - steal_delta;
+ } else {
+ consigned_delta = delta;
+ steal_delta = 0;
+ }
+ } else {
+ consigned_delta = 0;
+ steal_delta = delta;
+ }
+ vcpu->arch.st.accum_steal = steal_delta;
+ vcpu->arch.st.accum_consigned = consigned_delta;
}

static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -6187,11 +6206,25 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
return irqchip_in_kernel(vcpu->kvm) == (vcpu->arch.apic != NULL);
}

+enum hrtimer_restart steal_timer_fn(struct hrtimer *data)
+{
+ struct kvm_vcpu *vcpu;
+ ktime_t now;
+
+ vcpu = container_of(data, struct kvm_vcpu, arch.steal_timer);
+ vcpu->arch.current_consigned = 0;
+ now = ktime_get();
+ hrtimer_forward(&vcpu->arch.steal_timer, now,
+ ktime_set(0, KVM_STEAL_TIMER_DELAY));
+ return HRTIMER_RESTART;
+}
+
int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
{
struct page *page;
struct kvm *kvm;
int r;
+ ktime_t ktime;

BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
@@ -6234,6 +6267,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)

kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
+ /* Initialize and start a timer to capture steal and consigned time */
+ hrtimer_init(&vcpu->arch.steal_timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_REL);
+ vcpu->arch.steal_timer.function = &steal_timer_fn;
+ ktime = ktime_set(0, KVM_STEAL_TIMER_DELAY);
+ hrtimer_start(&vcpu->arch.steal_timer, ktime, HRTIMER_MODE_REL);

return 0;
fail_free_mce_banks:
@@ -6252,6 +6291,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
{
int idx;

+ hrtimer_cancel(&vcpu->arch.steal_timer);
kvm_pmu_destroy(vcpu);
kfree(vcpu->arch.mce_banks);
kvm_free_lapic(vcpu);

2012-10-17 02:25:12

by Michael Wolf

[permalink] [raw]
Subject: [PATCH RFC V2 5/5] Add an ioctl to communicate the consign limit to the host.

Add an ioctl to communicate the consign limit to the host.

Signed-off-by: Michael Wolf <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++++++
include/linux/kvm.h | 2 ++
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 7 +++++++
4 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 469e748..5a4b8db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5928,6 +5928,12 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return 0;
}

+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu, long entitlement)
+{
+ vcpu->arch.consigned_limit = entitlement;
+ return 0;
+}
+
int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
struct i387_fxsave_struct *fxsave =
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..6b211fb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -904,6 +904,8 @@ struct kvm_s390_ucas_mapping {
#define KVM_SET_ONE_REG _IOW(KVMIO, 0xac, struct kvm_one_reg)
/* VM is being stopped by host */
#define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
+/* Set the consignment limit which will be used to separete steal time */
+#define KVM_SET_ENTITLEMENT _IOW(KVMIO, 0xae, unsigned long)

#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8a59e0a..9a0a791 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -538,6 +538,8 @@ void kvm_arch_hardware_unsetup(void);
void kvm_arch_check_processor_compat(void *rtn);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu,
+ long entitlement);

void kvm_free_physmem(struct kvm *kvm);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d617f69..ccf0aec 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1936,6 +1936,13 @@ out_free2:
r = 0;
break;
}
+ case KVM_SET_ENTITLEMENT: {
+ r = kvm_arch_vcpu_ioctl_set_entitlement(vcpu, arg);
+ if (r)
+ goto out;
+ r = 0;
+ break;
+ }
default:
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
}

2012-10-17 09:14:47

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

On 10/17/2012 06:23 AM, Michael Wolf wrote:
> In the case of where you have a system that is running in a
> capped or overcommitted environment the user may see steal time
> being reported in accounting tools such as top or vmstat. This can
> cause confusion for the end user. To ease the confusion this patch set
> adds the idea of consigned (expected steal) time. The host will separate
> the consigned time from the steal time. The consignment limit passed to the
> host will be the amount of steal time expected within a fixed period of
> time. Any other steal time accruing during that period will show as the
> traditional steal time.
>
> TODO:
> * Change native_clock to take params and not return a value
> * Change update_rq_clock_task
>
> Changes from V1:
> * Removed the steal time allowed percentage from the guest
> * Moved the separation of consigned (expected steal) and steal time to the
> host.
> * No longer include a sysctl interface.
>

You are showing this in the guest somewhere, but tools like top will
still not show it. So for quite a while, it achieves nothing.

Of course this is a barrier that any new statistic has to go through. So
while annoying, this is per-se ultimately not a blocker.

What I still fail to see, is how this is useful information to be shown
in the guest. Honestly, if I'm in a guest VM or container, any time
during which I am not running is time I lost. It doesn't matter if this
was expected or not. This still seems to me as a host-side problem, to
be solved entirely by tooling.

2012-10-17 15:14:13

by Michael Wolf

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

On Wed, 2012-10-17 at 21:14 +0400, Glauber Costa wrote:
> On 10/17/2012 06:23 AM, Michael Wolf wrote:
> > In the case of where you have a system that is running in a
> > capped or overcommitted environment the user may see steal time
> > being reported in accounting tools such as top or vmstat. This can
> > cause confusion for the end user. To ease the confusion this patch set
> > adds the idea of consigned (expected steal) time. The host will separate
> > the consigned time from the steal time. The consignment limit passed to the
> > host will be the amount of steal time expected within a fixed period of
> > time. Any other steal time accruing during that period will show as the
> > traditional steal time.
> >
> > TODO:
> > * Change native_clock to take params and not return a value
> > * Change update_rq_clock_task
> >
> > Changes from V1:
> > * Removed the steal time allowed percentage from the guest
> > * Moved the separation of consigned (expected steal) and steal time to the
> > host.
> > * No longer include a sysctl interface.
> >
>
> You are showing this in the guest somewhere, but tools like top will
> still not show it. So for quite a while, it achieves nothing.
>
> Of course this is a barrier that any new statistic has to go through. So
> while annoying, this is per-se ultimately not a blocker.
>
> What I still fail to see, is how this is useful information to be shown
> in the guest. Honestly, if I'm in a guest VM or container, any time
> during which I am not running is time I lost. It doesn't matter if this
> was expected or not. This still seems to me as a host-side problem, to
> be solved entirely by tooling.
>

What tools like top and vmstat will show is altered. When I put time in
the consign bucket it does not show up in steal. So now as long as the
system is performing as expected the user will see 100% and 0% steal. I
added the consign field to /proc/stat so that all time accrued in the
period is accounted for and also for debugging purposes. The user wont
care about consign and will not see it.

2012-10-22 15:34:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

On 10/16/2012 10:23 PM, Michael Wolf wrote:
> In the case of where you have a system that is running in a
> capped or overcommitted environment the user may see steal time
> being reported in accounting tools such as top or vmstat. This can
> cause confusion for the end user.

How do s390 and Power systems deal with reporting that kind
of information?

IMHO it would be good to see what those do, so we do not end
up re-inventing the wheel, and confusing admins with yet another
way of reporting the information...

--
All rights reversed

2012-11-26 20:01:16

by Michael Wolf

[permalink] [raw]
Subject: Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

On 10/22/2012 10:33 AM, Rik van Riel wrote:
> On 10/16/2012 10:23 PM, Michael Wolf wrote:
>> In the case of where you have a system that is running in a
>> capped or overcommitted environment the user may see steal time
>> being reported in accounting tools such as top or vmstat. This can
>> cause confusion for the end user.
>
> How do s390 and Power systems deal with reporting that kind
> of information?
>
> IMHO it would be good to see what those do, so we do not end
> up re-inventing the wheel, and confusing admins with yet another
> way of reporting the information...
>
Sorry for the delay in the response. I'm assuming you are asking about
s390 and Power lpars.
In the case of lpar on POWER systems they simply report steal time and
do not alter it in any way.
They do however report how much processor is assigned to the partition
and that information is
in /proc/ppc64/lparcfg.

Mike