2007-08-16 15:59:01

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-16 15:23:27.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-16 15:23:41.000000000 +0200
@@ -955,6 +955,10 @@
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ /* list of struct account_modifiers: */
+ struct hlist_head account_modifiers;
+#endif

unsigned short ioprio;
#ifdef CONFIG_BLK_DEV_IO_TRACE
@@ -1873,6 +1877,31 @@
}
#endif

+#ifdef CONFIG_ACCOUNT_MODIFIERS
+struct account_modifier;
+struct account_ops {
+ cputime_t (*user_time)(struct account_modifier *,
+ struct task_struct *, cputime_t );
+ cputime_t (*system_time)(struct account_modifier *,
+ struct task_struct *, int, cputime_t);
+};
+
+struct account_modifier {
+ struct hlist_node link;
+ struct account_ops *ops;
+};
+
+void account_modifier_register(struct account_modifier *modifier);
+void account_modifier_unregister(struct account_modifier *modifier);
+
+static inline void account_modifier_init(struct account_modifier *modifier,
+ struct account_ops *ops)
+{
+ INIT_HLIST_NODE(&modifier->link);
+ modifier->ops = ops;
+}
+#endif
+
#endif /* __KERNEL__ */

#endif
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-16 15:15:33.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-16 15:23:28.000000000 +0200
@@ -1593,6 +1593,9 @@
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ INIT_HLIST_HEAD(&p->account_modifiers);
+#endif

/*
* We mark the process as running here, but have not actually
@@ -1731,6 +1734,62 @@
}

#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+
+void account_modifier_register(struct account_modifier *modifier)
+{
+ hlist_add_head(&modifier->link, &current->account_modifiers);
+}
+EXPORT_SYMBOL_GPL(account_modifier_register);
+
+void account_modifier_unregister(struct account_modifier *modifier)
+{
+ hlist_del(&modifier->link);
+}
+EXPORT_SYMBOL_GPL(account_modifier_unregister);
+
+static cputime_t fire_user_time_account_modifiers(struct task_struct *curr,
+ cputime_t cputime)
+{
+ struct account_modifier *modifier;
+ struct hlist_node *node;
+
+ hlist_for_each_entry(modifier, node, &curr->account_modifiers, link)
+ if (modifier->ops->user_time)
+ cputime = modifier->ops->user_time(modifier,
+ curr, cputime);
+ return cputime;
+}
+
+static cputime_t fire_system_time_account_modifiers(struct task_struct *curr,
+ int hardirq_offset,
+ cputime_t cputime)
+{
+ struct account_modifier *modifier;
+ struct hlist_node *node;
+
+ hlist_for_each_entry(modifier, node, &curr->account_modifiers, link)
+ if (modifier->ops->system_time)
+ cputime = modifier->ops->system_time(modifier,
+ curr, hardirq_offset, cputime);
+ return cputime;
+}
+
+#else
+
+static inline cputime_t fire_user_time_account_modifiers(struct task_struct *curr, cputime_t cputime)
+{
+ return cputime;
+}
+
+static inline cputime_t fire_system_time_account_modifiers(struct task_struct *curr,
+ int hardirq_offset,
+ cputime_t cputime)
+{
+ return cputime;
+}
+
+#endif

/**
* prepare_task_switch - prepare to switch tasks
@@ -3223,6 +3282,8 @@
struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
cputime64_t tmp;

+ cputime = fire_user_time_account_modifiers(p, cputime);
+
p->utime = cputime_add(p->utime, cputime);

/* Add user time to cpustat. */
@@ -3246,6 +3307,8 @@
struct rq *rq = this_rq();
cputime64_t tmp;

+ cputime = fire_system_time_account_modifiers(p,hardirq_offset, cputime);
+
p->stime = cputime_add(p->stime, cputime);

/* Add system time to cpustat. */
@@ -6522,6 +6585,9 @@
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&init_task.preempt_notifiers);
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ INIT_HLIST_HEAD(&init_task.account_modifiers);
+#endif

#ifdef CONFIG_SMP
nr_cpu_ids = highest_cpu + 1;


Attachments:
accounting_modifiers (4.00 kB)

2007-08-16 22:40:29

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

On Thu, 2007-08-16 at 17:58 +0200, Laurent Vivier wrote:
> [PATCH 3/3] introduce "account modifiers" mechanism in the kernel allowing a
> module to modify the collected accounting for a given task. This implementation
> is based on the "preempt_notifier". "account_system_time()" and
> "account_user_time()" can call functions registered by a module to modify the
> cputime value.
>
> Signed-off-by: Laurent Vivier <[email protected]>


Hi Laurent,

This seems a little like overkill. Why not just add an
"account_guest_time" which subtracts the given amount of time from
system time (if available) and adds it to guest time? Then kvm (and
lguest) should just need to call this at the right times.

Am I missing something?
Rusty.


2007-08-17 07:35:29

by Laurent Vivier

[permalink] [raw]
Subject: Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Rusty Russell wrote:
> On Thu, 2007-08-16 at 17:58 +0200, Laurent Vivier wrote:
>> [PATCH 3/3] introduce "account modifiers" mechanism in the kernel allowing a
>> module to modify the collected accounting for a given task. This implementation
>> is based on the "preempt_notifier". "account_system_time()" and
>> "account_user_time()" can call functions registered by a module to modify the
>> cputime value.
>>
>> Signed-off-by: Laurent Vivier <[email protected]>
>
>
> Hi Laurent,

Hi Rusty,
how are your puppies ?
And thank you for your comment.

> This seems a little like overkill. Why not just add an
> "account_guest_time" which subtracts the given amount of time from
> system time (if available) and adds it to guest time? Then kvm (and
> lguest) should just need to call this at the right times.

We can. I did something like this before.
By doing like that, I think there is a major issue: system time can be
decreasing (as we substract a value from it), and thus we can have negative
value in a tool like top. It's why I play with the cputime to add to system time
and not directly with the system time.

BUT I'm very open, my only goal is be able to compute guest time, "how" is not
very important...

what we can do:

- keep PATCHES 1 and 2, because we need to store guest time for cpu and tasks.
It is very generic.

- remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
guest time (by calling something like guest_enter() and guest_exit() from the
virtualization engine), and when in account_system_time() we have cputime >
vtime we substrate vtime from cputime and add vtime to user time and guest time.
But doing like this we freeze in kernel/sched.c the link between system time,
user time and guest time (i.e. system time = system time - vtime, user time =
user time + vtime and guest time = guest time + vtime).

- modify PATCH 4 to use new PATCH 3.

Do you agree ? Anybody doesn't agree ?

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-17 08:31:51

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

On Fri, 2007-08-17 at 09:35 +0200, Laurent Vivier wrote:
> Rusty Russell wrote:
> > Hi Laurent,
>
> Hi Rusty,
> how are your puppies ?

They're getting a little fat, actually. Too many features ...

> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
> guest time (by calling something like guest_enter() and guest_exit() from the
> virtualization engine), and when in account_system_time() we have cputime >
> vtime we substrate vtime from cputime and add vtime to user time and guest time.
> But doing like this we freeze in kernel/sched.c the link between system time,
> user time and guest time (i.e. system time = system time - vtime, user time =
> user time + vtime and guest time = guest time + vtime).

Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
code, which then knows to add the tick to the guest time. That seems
the simplest possible solution.

lguest or kvm would set the flag before running the guest (which is done
with preempt disabled or using preemption hooks), and reset it
afterwards.

Thoughts?
Rusty.

2007-08-17 09:16:48

by Laurent Vivier

[permalink] [raw]
Subject: Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Rusty Russell wrote:
> On Fri, 2007-08-17 at 09:35 +0200, Laurent Vivier wrote:
>> Rusty Russell wrote:
>>> Hi Laurent,
>> Hi Rusty,
>> how are your puppies ?
>
> They're getting a little fat, actually. Too many features ...
>
>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>> guest time (by calling something like guest_enter() and guest_exit() from the
>> virtualization engine), and when in account_system_time() we have cputime >
>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>> But doing like this we freeze in kernel/sched.c the link between system time,
>> user time and guest time (i.e. system time = system time - vtime, user time =
>> user time + vtime and guest time = guest time + vtime).
>
> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
> code, which then knows to add the tick to the guest time. That seems
> the simplest possible solution.
>
> lguest or kvm would set the flag before running the guest (which is done
> with preempt disabled or using preemption hooks), and reset it
> afterwards.
>
> Thoughts?

It was my first attempt (except I didn't have a per-cpu flag, but a per-task
flag), it's not visible but I love simplicity... ;-)

A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
exited from VCPU and thus this flags is off (so we account 0 to the guest). What
I did then is "set the flag on when we enter in the VCPU, and
"account_system_time()" sets the flag off when it adds this timeslice to cpustat
(and compute correctly guest, user, system time). But I didn't like this idea
because all code executed after we entered in the VCPU is accounted to the guest
until we have an account_system_time() and I suppose we can have real system
time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
a timeslice.

So ? What's best ?
Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-17 11:51:57

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH/RFC 3/4, second shot]Introduce "account_guest_time"

Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-17 10:18:53.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-17 12:33:22.000000000 +0200
@@ -1192,6 +1192,9 @@
#ifdef CONFIG_FAULT_INJECTION
int make_it_fail;
#endif
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t vtime;
+#endif
};

/*
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-17 10:18:53.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-17 12:33:07.000000000 +0200
@@ -3233,6 +3233,37 @@
cpustat->user = cputime64_add(cpustat->user, tmp);
}

+#ifdef CONFIG_GUEST_ACCOUNTING
+/*
+ * Account guest time to a process
+ * @p: the process that the cpu time gets accounted to
+ * @cputime: the cpu time spent in kernel space since the last update
+ */
+
+static cputime_t account_guest_time(struct task_struct *p, cputime_t cputime)
+{
+ struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+ ktime_t kmsec = ktime_set(0, NSEC_PER_MSEC);
+ cputime_t cmsec = msecs_to_cputime(1);
+
+ while ((ktime_to_ns(p->vtime) >= NSEC_PER_MSEC) &&
+ (cputime_to_msecs(cputime) >= 1)) {
+ p->vtime = ktime_sub(p->vtime, kmsec);
+ p->utime = cputime_add(p->utime, cmsec);
+ p->gtime = cputime_add(p->gtime, cmsec);
+
+ cpustat->guest = cputime64_add(cpustat->guest,
+ cputime_to_cputime64(cmsec));
+ cpustat->user = cputime64_add(cpustat->user,
+ cputime_to_cputime64(cmsec));
+
+ cputime = cputime_sub(cputime, cmsec);
+ }
+
+ return cputime;
+}
+#endif
+
/*
* Account system cpu time to a process.
* @p: the process that the cpu time gets accounted to
@@ -3246,6 +3277,10 @@
struct rq *rq = this_rq();
cputime64_t tmp;

+#ifdef CONFIG_GUEST_ACCOUNTING
+ cputime = account_guest_time(p, cputime);
+#endif
+
p->stime = cputime_add(p->stime, cputime);

/* Add system time to cpustat. */


Attachments:
account_guest (1.89 kB)

2007-08-17 11:54:55

by Laurent Vivier

[permalink] [raw]
Subject: Re: [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"

Index: kvm/drivers/kvm/Kconfig
===================================================================
--- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
+++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
@@ -41,4 +41,10 @@
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.

+config GUEST_ACCOUNTING
+ bool "Virtual Machine accounting support"
+ depends on KVM
+ ---help---
+ Allows to account CPU time used by the Virtual Machines.
+
endif # VIRTUALIZATION
Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h 2007-08-17 10:19:30.000000000 +0200
+++ kvm/drivers/kvm/kvm.h 2007-08-17 10:21:56.000000000 +0200
@@ -589,6 +589,19 @@

int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);

+#ifdef CONFIG_GUEST_ACCOUNTING
+static inline ktime_t kvm_guest_enter(void)
+{
+ return ktime_get();
+}
+
+static inline void kvm_guest_exit(ktime_t enter)
+{
+ ktime_t delta = ktime_sub(ktime_get(), enter);
+ current->vtime = ktime_add(current->vtime, delta);
+}
+#endif
+
static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
{
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-17 10:22:07.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-17 10:23:32.000000000 +0200
@@ -1392,6 +1392,9 @@
u16 gs_selector;
u16 ldt_selector;
int r;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t now;
+#endif

again:
r = kvm_mmu_reload(vcpu);
@@ -1404,6 +1407,9 @@
clgi();

vcpu->guest_mode = 1;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1542,9 @@
#endif
: "cc", "memory" );

+#ifdef CONFIG_GUEST_ACCOUNTING
+ kvm_guest_exit(now);
+#endif
vcpu->guest_mode = 0;

if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-17 10:23:36.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-17 10:24:37.000000000 +0200
@@ -2052,6 +2052,9 @@
struct vcpu_vmx *vmx = to_vmx(vcpu);
u8 fail;
int r;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t now;
+#endif

preempted:
if (vcpu->guest_debug.enabled)
@@ -2078,6 +2081,9 @@
local_irq_disable();

vcpu->guest_mode = 1;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2204,9 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );

+#ifdef CONFIG_GUEST_ACCOUNTING
+ kvm_guest_exit(now);
+#endif
vcpu->guest_mode = 0;
local_irq_enable();


Attachments:
kvm_account_guest (2.81 kB)

2007-08-17 12:55:27

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Laurent Vivier wrote:
>>
>>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>>> guest time (by calling something like guest_enter() and guest_exit() from the
>>> virtualization engine), and when in account_system_time() we have cputime >
>>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>>> But doing like this we freeze in kernel/sched.c the link between system time,
>>> user time and guest time (i.e. system time = system time - vtime, user time =
>>> user time + vtime and guest time = guest time + vtime).
>>>
>> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
>> code, which then knows to add the tick to the guest time. That seems
>> the simplest possible solution.
>>
>> lguest or kvm would set the flag before running the guest (which is done
>> with preempt disabled or using preemption hooks), and reset it
>> afterwards.
>>
>> Thoughts?
>>
>
> It was my first attempt (except I didn't have a per-cpu flag, but a per-task
> flag), it's not visible but I love simplicity... ;-)
>
> A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
> exited from VCPU and thus this flags is off (so we account 0 to the guest). What
> I did then is "set the flag on when we enter in the VCPU, and
> "account_system_time()" sets the flag off when it adds this timeslice to cpustat
> (and compute correctly guest, user, system time). But I didn't like this idea
> because all code executed after we entered in the VCPU is accounted to the guest
> until we have an account_system_time() and I suppose we can have real system
> time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
> a timeslice.
>
> So ? What's best ?
>

The normal user/system accounting has the same issue, no? Whereever we
happen to land (kernel or user) gets the whole tick.

So I think it is okay to have the same limitation for guest time.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-08-17 12:59:51

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4, second shot]Introduce "account_guest_time"

Laurent Vivier wrote:
> This is another way to compute guest time... I remove the "account modifiers"
> mechanism and call directly account_guest_time() from account_system_time().
> account_system_time() computes user, system and guest times according value
> accumulated in vtime (a ktime_t) in task_struct by the virtual machine.
>


> @@ -3246,6 +3277,10 @@
> struct rq *rq = this_rq();
> cputime64_t tmp;
>
> +#ifdef CONFIG_GUEST_ACCOUNTING
> + cputime = account_guest_time(p, cputime);
> +#endif
> +
> p->stime = cputime_add(p->stime, cputime);
>
> /* Add system time to cpustat. */


In order to reduce the impact on whatever function this is in (use diff
-p please), you can always have a definition of account_guest_time:

#else

static cputime_t account_guest_time(struct task_struct *p, cputime_t cputime)
{
return cputime;
}

#endif


This way the #ifdef/#endif is not necessary when calling it.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-08-17 13:03:23

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"

Laurent Vivier wrote:
> KVM updates vtime in task_struct to allow account_guest_time() to modify user,
> system and guest time in cpustat accordingly.
>

> --- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
> +++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
> @@ -41,4 +41,10 @@
> Provides support for KVM on AMD processors equipped with the AMD-V
> (SVM) extensions.
>
> +config GUEST_ACCOUNTING
> + bool "Virtual Machine accounting support"
> + depends on KVM
> + ---help---
> + Allows to account CPU time used by the Virtual Machines.
> +


Other way round. In the patch that adds account_guest_time(), have a
CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.

The advantages of this are:
- the puppyvisor can also select this if it so wishes
- we don't have core code reference some obscure module

CONFIG_PREEMPT_NOTIFIERS does the same thing.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-08-17 13:08:30

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Avi Kivity wrote:
> Laurent Vivier wrote:
>>>
>>>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>>>> guest time (by calling something like guest_enter() and guest_exit() from the
>>>> virtualization engine), and when in account_system_time() we have cputime >
>>>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>>>> But doing like this we freeze in kernel/sched.c the link between system time,
>>>> user time and guest time (i.e. system time = system time - vtime, user time =
>>>> user time + vtime and guest time = guest time + vtime).
>>>>
>>> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
>>> code, which then knows to add the tick to the guest time. That seems
>>> the simplest possible solution.
>>>
>>> lguest or kvm would set the flag before running the guest (which is done
>>> with preempt disabled or using preemption hooks), and reset it
>>> afterwards.
>>>
>>> Thoughts?
>>>
>> It was my first attempt (except I didn't have a per-cpu flag, but a per-task
>> flag), it's not visible but I love simplicity... ;-)
>>
>> A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
>> exited from VCPU and thus this flags is off (so we account 0 to the guest). What
>> I did then is "set the flag on when we enter in the VCPU, and
>> "account_system_time()" sets the flag off when it adds this timeslice to cpustat
>> (and compute correctly guest, user, system time). But I didn't like this idea
>> because all code executed after we entered in the VCPU is accounted to the guest
>> until we have an account_system_time() and I suppose we can have real system
>> time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
>> a timeslice.
>>
>> So ? What's best ?
>>
>
> The normal user/system accounting has the same issue, no? Whereever we
> happen to land (kernel or user) gets the whole tick.

Yes... but perhaps I should rewrite this too ;-)

> So I think it is okay to have the same limitation for guest time.

OK, so we can go back to my first patch.
Who can decide to introduce this into the kernel ?

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-17 13:16:35

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"

Avi Kivity wrote:
> Laurent Vivier wrote:
>> KVM updates vtime in task_struct to allow account_guest_time() to modify user,
>> system and guest time in cpustat accordingly.
>>
>
>> --- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
>> +++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
>> @@ -41,4 +41,10 @@
>> Provides support for KVM on AMD processors equipped with the AMD-V
>> (SVM) extensions.
>>
>> +config GUEST_ACCOUNTING
>> + bool "Virtual Machine accounting support"
>> + depends on KVM
>> + ---help---
>> + Allows to account CPU time used by the Virtual Machines.
>> +
>
>
> Other way round. In the patch that adds account_guest_time(), have a
> CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
> dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.

I was wondering in which Kconfig I can put it...

> The advantages of this are:
> - the puppyvisor can also select this if it so wishes
> - we don't have core code reference some obscure module

I agree.

> CONFIG_PREEMPT_NOTIFIERS does the same thing.

I saw

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-17 13:32:47

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Am Freitag, 17. August 2007 schrieb Laurent Vivier:
> > The normal user/system accounting has the same issue, no? Whereever we
> > happen to land (kernel or user) gets the whole tick.
>
> Yes... but perhaps I should rewrite this too ;-)

If you look further, you will see, that this was actually rewritten in 2.6.12
and thats why we have cputime_t. The infrastructure is currently only used by
s390 and ppc64. On s390 we use our cpu timer to get the current time on each
syscall/irq/context switch etc to get precise accounting data for
system/user/irq/softirq/nice. We also get steal time (this is interesting for
the guest: how much of my cpu was actually not available because the
hypervisor scheduled me away). I dont know enough about other architectures
to say which one could exploit this infrastructure as well.

The current git tree is somewhat broken for CONFIG_VIRT_CPU_ACCOUTING due to
the accouting changes introduced by CFS - we work on this.

If you are interested in the cputime stuff, you can have a look at
arch/s390/kernel/vtime.c (account_system_vtime, account_vtime) as well as:

http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=6be7071fdd321c206b1ee7a3e534782f25572830
for the first introduction and
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=db59f519d37f80683f3611927f6b5c3bdfc0f9e6
for the s390 exploitation.

Christian

2007-08-17 14:13:04

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/kvm.h 2007-08-17 15:29:46.000000000 +0200
@@ -589,6 +589,19 @@

int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);

+#ifndef PF_VCPU
+#define PF_VCPU 0 /* no kernel support */
+#endif
+
+static inline void kvm_guest_enter(void)
+{
+ current->flags |= PF_VCPU;
+}
+
+static inline void kvm_guest_exit(void)
+{
+}
+
static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
{
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-17 15:27:03.000000000 +0200
@@ -1404,6 +1404,7 @@
clgi();

vcpu->guest_mode = 1;
+ kvm_guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1537,7 @@
#endif
: "cc", "memory" );

+ kvm_guest_exit();
vcpu->guest_mode = 0;

if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-17 15:27:45.000000000 +0200
@@ -2078,6 +2078,7 @@
local_irq_disable();

vcpu->guest_mode = 1;
+ kvm_guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2199,7 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );

+ kvm_guest_exit();
vcpu->guest_mode = 0;
local_irq_enable();


Attachments:
account_guest (1.48 kB)
kvm_account_guest (1.71 kB)
Download all attachments

2007-08-19 07:38:10

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Laurent Vivier wrote:
> Avi Kivity wrote:
> [...]
>
>> The normal user/system accounting has the same issue, no? Whereever we
>> happen to land (kernel or user) gets the whole tick.
>>
>> So I think it is okay to have the same limitation for guest time.
>>
>>
>
> So this is how it looks like.
> PATCH 1 and 2 are always a prerequisite.
>
>

> + tmp = cputime_to_cputime64(cputime);
> + if (p->flags & PF_VCPU) {
> + p->utime = cputime_add(p->utime, cputime);
> + p->gtime = cputime_add(p->gtime, cputime);
> +
> + cpustat->guest = cputime64_add(cpustat->guest, tmp);
> + cpustat->user = cputime64_add(cpustat->user, tmp);
> +
> + p->flags &= ~PF_VCPU;
> +
> + return;
> + }
> +

Where did CONFIG_GUEST_ACCOUNTING go?


--
error compiling committee.c: too many arguments to function

2007-08-19 07:39:45

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"

Laurent Vivier wrote:
>> Other way round. In the patch that adds account_guest_time(), have a
>> CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
>> dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.
>>
>
> I was wondering in which Kconfig I can put it...
>
>

Looks like init/Kconfig has similar stuff.

--
error compiling committee.c: too many arguments to function

2007-08-19 07:41:31

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Laurent Vivier wrote:
>> So I think it is okay to have the same limitation for guest time.
>>
>
> OK, so we can go back to my first patch.
> Who can decide to introduce this into the kernel ?
>

The sched.c parts are best merged by Ingo, and I can carry the kvm
parts. Alternatively, I can carry the entire patchset if Ingo acks it.

--
error compiling committee.c: too many arguments to function

2007-08-20 07:30:25

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Avi Kivity wrote:
> Laurent Vivier wrote:
>> Avi Kivity wrote:
>> [...]
>>
>>> The normal user/system accounting has the same issue, no? Whereever we
>>> happen to land (kernel or user) gets the whole tick.
>>>
>>> So I think it is okay to have the same limitation for guest time.
>>>
>>>
>>
>> So this is how it looks like.
>> PATCH 1 and 2 are always a prerequisite.
>>
>>
>
>> + tmp = cputime_to_cputime64(cputime);
>> + if (p->flags & PF_VCPU) {
>> + p->utime = cputime_add(p->utime, cputime);
>> + p->gtime = cputime_add(p->gtime, cputime);
>> +
>> + cpustat->guest = cputime64_add(cpustat->guest, tmp);
>> + cpustat->user = cputime64_add(cpustat->user, tmp);
>> +
>> + p->flags &= ~PF_VCPU;
>> +
>> + return;
>> + }
>> +
>
> Where did CONFIG_GUEST_ACCOUNTING go?
>

Lost in the sea ...

Actually, I thought this modification is not enough expensive (in time and
space) to justify a CONFIG_*. But if you think so I can add this in init/Kconfig.

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-20 07:55:50

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism

Laurent Vivier wrote:
>>
>> Where did CONFIG_GUEST_ACCOUNTING go?
>>
>>
>
> Lost in the sea ...
>
> Actually, I thought this modification is not enough expensive (in time and
> space) to justify a CONFIG_*. But if you think so I can add this in init/Kconfig.
>
>

The difference between "convince everyone that it isn't expensive,
including the embedded guys" and preemptively adding a config option can
be quite large...

--
error compiling committee.c: too many arguments to function