2007-08-10 15:58:54

by Laurent Vivier

[permalink] [raw]
Subject: [PATCH 0/2][KVM] guest time accounting

The aim of these two patches is to measure the CPU time used by a virtual
machine. All comments are welcome... I'm not sure it's the good way to do that.

[PATCH 1/2] introduce a new field, "guest", in cpustat to store the time used by
the CPU to run virtual CPU. Modify /proc/stat to display this new field.

[PATCH 2/2] modify account_system_time() to add cputime to cpustat->guest if we
are running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code although it
is executed in the kernel. We duplicate VCPU time between guest and user to
allow an unmodified "top(1)" to display correct value. A modified "top(1)" is
able to display good cpu user time and cpu guest time by subtracting cpu guest
time from cpu user time.

Signed-Off-by: Laurent Vivier <[email protected]>
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


2007-08-13 12:59:12

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Laurent Vivier wrote:
> The aim of these two patches is to measure the CPU time used by a virtual
> machine. All comments are welcome... I'm not sure it's the good way to do that.
>
> [PATCH 1/2] introduce a new field, "guest", in cpustat to store the time used by
> the CPU to run virtual CPU. Modify /proc/stat to display this new field.
>
> [PATCH 2/2] modify account_system_time() to add cputime to cpustat->guest if we
> are running a VCPU. We add this cputime to cpustat->user instead of
> cpustat->system because this part of KVM code is in fact user code although it
> is executed in the kernel. We duplicate VCPU time between guest and user to
> allow an unmodified "top(1)" to display correct value. A modified "top(1)" is
> able to display good cpu user time and cpu guest time by subtracting cpu guest
> time from cpu user time.
>

[copying Ingo and Rusty]

The patches look good. A couple of comments:

- perhaps the new fields should be guarded by a #ifdef CONFIG_HYPERVISOR
(selected by CONFIG_KVM)? that way the (minor) additional overhead is
only incurred if it can possibly be used. I imagine that our canine
cousin will want to use this as well.

- I think that there is per-task accounting of user time and system
time; that should be extended as well.



--
error compiling committee.c: too many arguments to function

2007-08-13 13:03:32

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Avi Kivity wrote:
> Laurent Vivier wrote:
>> The aim of these two patches is to measure the CPU time used by a virtual
>> machine. All comments are welcome... I'm not sure it's the good way to
>> do that.
>>
>> [PATCH 1/2] introduce a new field, "guest", in cpustat to store the
>> time used by
>> the CPU to run virtual CPU. Modify /proc/stat to display this new field.
>>
>> [PATCH 2/2] modify account_system_time() to add cputime to
>> cpustat->guest if we
>> are running a VCPU. We add this cputime to cpustat->user instead of
>> cpustat->system because this part of KVM code is in fact user code
>> although it
>> is executed in the kernel. We duplicate VCPU time between guest and
>> user to
>> allow an unmodified "top(1)" to display correct value. A modified
>> "top(1)" is
>> able to display good cpu user time and cpu guest time by subtracting
>> cpu guest
>> time from cpu user time.
>>
>
> [copying Ingo and Rusty]
>
> The patches look good. A couple of comments:
>
> - perhaps the new fields should be guarded by a #ifdef CONFIG_HYPERVISOR
> (selected by CONFIG_KVM)? that way the (minor) additional overhead is
> only incurred if it can possibly be used. I imagine that our canine
> cousin will want to use this as well.

There is also a CONFIG_VIRTUALIZATION and a CONFIG_VIRT_CPU_ACCOUNTING (from
s390 and powerpc) Which one to use ?

I'm wondering if we can have a more accurate accounting:

- For the moment we add all system time since the previous entering to the VCPU
to the guest time (and I guess there is some real system time in it ???)

- Perhaps we can sum nanoseconds spent in the VCPU and add it to cpustat when
these ns are greater than 1 ms ? (I'm trying to make something in this way)

> - I think that there is per-task accounting of user time and system
> time; that should be extended as well.

it should be easy to do too...

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-13 13:13:14

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Laurent Vivier wrote:
>> - perhaps the new fields should be guarded by a #ifdef CONFIG_HYPERVISOR
>> (selected by CONFIG_KVM)? that way the (minor) additional overhead is
>> only incurred if it can possibly be used. I imagine that our canine
>> cousin will want to use this as well.
>>
>
> There is also a CONFIG_VIRTUALIZATION and a CONFIG_VIRT_CPU_ACCOUNTING (from
> s390 and powerpc) Which one to use ?
>

Are these options for using the kernel as a guest or host? I'd guess
the former.

> I'm wondering if we can have a more accurate accounting:
>
> - For the moment we add all system time since the previous entering to the VCPU
> to the guest time (and I guess there is some real system time in it ???)
>
> - Perhaps we can sum nanoseconds spent in the VCPU and add it to cpustat when
> these ns are greater than 1 ms ? (I'm trying to make something in this way)
>

I think that it's okay to use the same method as user/system time
accounting. But Ingo the the right man to ask.


--
error compiling committee.c: too many arguments to function

2007-08-13 15:19:56

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-13 14:11:38.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-13 14:29:44.000000000 +0200
@@ -2052,6 +2052,7 @@
struct vcpu_vmx *vmx = to_vmx(vcpu);
u8 fail;
int r;
+ ktime_t now, delta;

preempted:
if (vcpu->guest_debug.enabled)
@@ -2078,6 +2079,7 @@
local_irq_disable();

vcpu->guest_mode = 1;
+ now = ktime_get();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2200,8 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );

+ delta = ktime_sub(ktime_get(), now);
+ current->vtime = ktime_add(current->vtime, delta);
vcpu->guest_mode = 0;
local_irq_enable();

Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-13 14:25:58.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-13 14:29:44.000000000 +0200
@@ -1192,6 +1192,9 @@
#ifdef CONFIG_FAULT_INJECTION
int make_it_fail;
#endif
+#ifdef CONFIG_VIRTUALIZATION
+ ktime_t vtime;
+#endif
};

/*
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-13 14:11:38.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-13 14:34:47.000000000 +0200
@@ -3212,6 +3212,29 @@
return ns;
}

+#ifdef CONFIG_VIRTUALIZATION
+static cputime_t account_guest_time(struct task_struct *p, cputime_t cputime)
+{
+ struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+ ktime_t kmsec = ktime_set(0, NSEC_PER_MSEC);
+ cputime_t cmsec = msecs_to_cputime(1);
+
+ while ((ktime_to_ns(p->vtime) >= NSEC_PER_MSEC) &&
+ (cputime_to_msecs(cputime) >= 1)) {
+ p->vtime = ktime_sub(p->vtime, kmsec);
+ p->utime = cputime_add(p->utime, cmsec);
+
+ cputime = cputime_sub(cputime, cmsec);
+
+ cpustat->guest = cputime64_add(cpustat->guest,
+ cputime_to_cputime64(cmsec));
+ cpustat->user = cputime64_add(cpustat->user,
+ cputime_to_cputime64(cmsec));
+ }
+ return cputime;
+}
+#endif
+
/*
* Account user cpu time to a process.
* @p: the process that the cpu time gets accounted to
@@ -3246,6 +3269,10 @@
struct rq *rq = this_rq();
cputime64_t tmp;

+#ifdef CONFIG_VIRTUALIZATION
+ cputime = account_guest_time(p, cputime);
+#endif
+
p->stime = cputime_add(p->stime, cputime);

/* Add system time to cpustat. */
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-13 14:31:16.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-13 14:33:10.000000000 +0200
@@ -1392,6 +1392,7 @@
u16 gs_selector;
u16 ldt_selector;
int r;
+ ktime_t now, delta;

again:
r = kvm_mmu_reload(vcpu);
@@ -1404,6 +1405,7 @@
clgi();

vcpu->guest_mode = 1;
+ now = ktime_get();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1538,8 @@
#endif
: "cc", "memory" );

+ delta = ktime_sub(ktime_get(), now);
+ current->vtime = ktime_add(current->vtime, delta);
vcpu->guest_mode = 0;

if (vcpu->fpu_active) {


Attachments:
kvm_stat_guest (3.16 kB)
signature.asc (189.00 B)
OpenPGP digital signature
Download all attachments

2007-08-13 15:26:46

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Laurent Vivier wrote:



>> Are these options for using the kernel as a guest or host? I'd guess
>> the former.
>>
>
> I didn't find CONFIG_HYPERVISOR.
>

I meant, add a new option CONFIG_HYPERVISOR.

> The good one seems to be CONFIG_VIRTUALIZATION that is used to activate CONFIG_KVM.
>

It's just a menu. Right now there's just one entry, but it will
change. And if CONFIG_VIRTUALIZATION is selected, it doesn't mean
CONFIG_KVM is selected.

--
error compiling committee.c: too many arguments to function

2007-08-13 15:46:26

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Am Freitag, 10. August 2007 schrieb Laurent Vivier:
> The aim of these two patches is to measure the CPU time used by a virtual
> machine. All comments are welcome... I'm not sure it's the good way to do
that.

I did something similar for or s390guest prototype, that Carsten posted in
May. I decided to account guest time to the user process instead of adding a
new field to avoid hazzle with old top. As you can read in the patch comment,
I personally prefer a new field if we can get one.

My implementation uses a similar mechanism like hard and softirq. So I have an
sie_enter an sie_exit and a task_is_in_sie function - like irq_enter and
irq_exit. The main difference is based on the fact, that s390 has precise
accouting for irq, steal, user and system time, and therefore my patch is
based on architecture specifc code using CONFIG_VIRT_CPU_ACCOUNT.

In general my patch has the same idea as your patch, so I am going to review
your patch and see if it would fit for s390.

For reference this is the (never posted) old patch for our virtualisation
prototype. It wont work with kvm but it gives you the idea what we had in
mind on s390.

----------- snip old PATCH GPLv2 ----------

Subject: [PATCH/RFC] Fix system<->user misaccount of interpreted execution

From: Christian Borntraeger <[email protected]>

This patches fixes the accouting of guest cpu time. As sie is executed via a
system call, all guest operations were accounted as system time. To fix this
we define a per thread "sie context". Before issuing the sie instruction we
enter this context and leave the context afterwards. sie_enter and sie_exit
call account_system_vtime, which now checks for being in sie_context. We
define the sie_context to be accounted as user time.

Possible future enhancement: We could add an additional field: "interpretion
time" to cpu stat and process time. Thus we could differentiate between user
time in the host and host user time spent for guests. The main challenge is
the necessary user space change. Therefore, we could export the interpretion
time with a new interface. To be defined.

Signed-off-By: Christian Borntraeger <[email protected]>
Signed-off-By: Carsten Otte <[email protected]>

---
arch/s390/Kconfig | 1 +
arch/s390/host/s390host.c | 15 +++++++++++++++
arch/s390/kernel/process.c | 1 +
arch/s390/kernel/vtime.c | 12 +++++++++++-
include/asm-s390/thread_info.h | 2 ++
5 files changed, 30 insertions(+), 1 deletion(-)

Index: linux-2.6.22/arch/s390/kernel/vtime.c
===================================================================
--- linux-2.6.22.orig/arch/s390/kernel/vtime.c
+++ linux-2.6.22/arch/s390/kernel/vtime.c
@@ -97,6 +97,11 @@ void account_vtime(struct task_struct *t
account_system_time(tsk, 0, cputime);
}

+static inline int task_is_in_sie(struct thread_info *thread)
+{
+ return thread->in_sie;
+}
+
/*
* Update process times based on virtual cpu times stored by entry.S
* to the lowcore fields user_timer, system_timer & steal_clock.
@@ -114,7 +119,12 @@ void account_system_vtime(struct task_st
cputime = S390_lowcore.system_timer >> 12;
S390_lowcore.system_timer -= cputime << 12;
S390_lowcore.steal_clock -= cputime << 12;
- account_system_time(tsk, 0, cputime);
+
+ if (task_is_in_sie(task_thread_info(tsk)) && !hardirq_count() &&
+ !softirq_count())
+ account_user_time(tsk, cputime);
+ else
+ account_system_time(tsk, 0, cputime);
}

static inline void set_vtimer(__u64 expires)
Index: linux-2.6.22/arch/s390/host/s390host.c
===================================================================
--- linux-2.6.22.orig/arch/s390/host/s390host.c
+++ linux-2.6.22/arch/s390/host/s390host.c
@@ -27,6 +27,19 @@ static int s390host_do_action(unsigned l

static DEFINE_MUTEX(s390host_init_mutex);

+static void enter_sie(void)
+{
+ account_system_vtime(current);
+ current_thread_info()->in_sie = 1;
+}
+
+static void exit_sie(void)
+{
+ account_system_vtime(current);
+ current_thread_info()->in_sie = 0;
+}
+
+
static void s390host_get_data(struct s390host_data *data)
{
atomic_inc(&data->count);
@@ -297,7 +310,9 @@ again:
schedule();

sie_kernel->sie_block.icptcode = 0;
+ enter_sie();
ret = sie64a(sie_kernel);
+ exit_sie();
if (ret)
goto out;

Index: linux-2.6.22/include/asm-s390/thread_info.h
===================================================================
--- linux-2.6.22.orig/include/asm-s390/thread_info.h
+++ linux-2.6.22/include/asm-s390/thread_info.h
@@ -55,6 +55,7 @@ struct thread_info {
struct restart_block restart_block;
struct s390host_data *s390host_data; /* s390host data */
int sie_cpu; /* sie cpu number */
+ int in_sie; /* 1 => cpu is in sie*/
};

/*
@@ -72,6 +73,7 @@ struct thread_info {
}, \
.s390host_data = NULL, \
.sie_cpu = 0, \
+ .in_sie = 0, \
}

#define init_thread_info (init_thread_union.thread_info)
Index: linux-2.6.22/arch/s390/kernel/process.c
===================================================================
--- linux-2.6.22.orig/arch/s390/kernel/process.c
+++ linux-2.6.22/arch/s390/kernel/process.c
@@ -277,6 +277,7 @@ int copy_thread(int nr, unsigned long cl
memset(&p->thread.per_info,0,sizeof(p->thread.per_info));
task_thread_info(p)->s390host_data = NULL;
task_thread_info(p)->sie_cpu = -1;
+ task_thread_info(p)->in_sie = 0;
return 0;
}

Index: linux-2.6.22/arch/s390/Kconfig
===================================================================
--- linux-2.6.22.orig/arch/s390/Kconfig
+++ linux-2.6.22/arch/s390/Kconfig
@@ -524,6 +524,7 @@ config S390_HOST
bool "s390 host support (EXPERIMENTAL)"
depends on 64BIT && EXPERIMENTAL
select S390_SWITCH_AMODE
+ select VIRT_CPU_ACCOUNTING
help
Select this option if you want to host guest Linux images


2007-08-13 15:49:58

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Christian Borntraeger wrote:
> Am Freitag, 10. August 2007 schrieb Laurent Vivier:
>
>> The aim of these two patches is to measure the CPU time used by a virtual
>> machine. All comments are welcome... I'm not sure it's the good way to do
>>
> that.
>
> I did something similar for or s390guest prototype, that Carsten posted in
> May. I decided to account guest time to the user process instead of adding a
> new field to avoid hazzle with old top. As you can read in the patch comment,
> I personally prefer a new field if we can get one.
>

Laurent's patch gives the best of both worlds: on old 'top', you get
guest time accounted as user time, while on new 'top' it is accounted
separately. This is done by reporting user time as the sum of the real
user time and guest time. A newer 'top' can subtract guest time from
user time to get the correct statistic.


> My implementation uses a similar mechanism like hard and softirq. So I have an
> sie_enter an sie_exit and a task_is_in_sie function - like irq_enter and
> irq_exit. The main difference is based on the fact, that s390 has precise
> accouting for irq, steal, user and system time, and therefore my patch is
> based on architecture specifc code using CONFIG_VIRT_CPU_ACCOUNT.
>

Okay, so the code should be under that config option, and kvm should
select it.


--
error compiling committee.c: too many arguments to function

2007-08-13 15:56:37

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Am Montag, 13. August 2007 schrieb Laurent Vivier:
> > [copying Ingo and Rusty]

@Avi, seems that sourceforge is mangling the cc list?

> >
> > The patches look good. A couple of comments:
> >
> > - perhaps the new fields should be guarded by a #ifdef CONFIG_HYPERVISOR
> > (selected by CONFIG_KVM)? that way the (minor) additional overhead is
> > only incurred if it can possibly be used. I imagine that our canine
> > cousin will want to use this as well.
>
> There is also a CONFIG_VIRTUALIZATION and a CONFIG_VIRT_CPU_ACCOUNTING (from
> s390 and powerpc) Which one to use ?

CONFIG_VIRT_CPU_ACCOUNTING is used for the precise accouting of user,system,
steal and irq time on these platforms and is not what you want for the on/off
decision.
>
> I'm wondering if we can have a more accurate accounting:
>
> - For the moment we add all system time since the previous entering to the
> VCPU to the guest time (and I guess there is some real system time in
> it ???)
> - Perhaps we can sum nanoseconds spent in the VCPU and add it to cpustat
> when these ns are greater than 1 ms ? (I'm trying to make something in this
way)

If you look at the patch I have posted some minutes ago, I use a method
similar to irq_enter and irq_exit to separate real system time from guest
time.

> > - I think that there is per-task accounting of user time and system
> > time; that should be extended as well.
> it should be easy to do too...

We have to make sure that userspace doesnt break, but yes we should have a
guest time for processes as well.

Christian

2007-08-13 16:00:57

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Christian Borntraeger wrote:
> Am Montag, 13. August 2007 schrieb Laurent Vivier:
>
>>> [copying Ingo and Rusty]
>>>
>
> @Avi, seems that sourceforge is mangling the cc list?
>
>

It's not configured to do so. Can you be more specific?


>>> The patches look good. A couple of comments:
>>>
>>> - perhaps the new fields should be guarded by a #ifdef CONFIG_HYPERVISOR
>>> (selected by CONFIG_KVM)? that way the (minor) additional overhead is
>>> only incurred if it can possibly be used. I imagine that our canine
>>> cousin will want to use this as well.
>>>
>> There is also a CONFIG_VIRTUALIZATION and a CONFIG_VIRT_CPU_ACCOUNTING (from
>> s390 and powerpc) Which one to use ?
>>
>
> CONFIG_VIRT_CPU_ACCOUNTING is used for the precise accouting of user,system,
> steal and irq time on these platforms and is not what you want for the on/off
> decision.
>

Ah, ok.

>> I'm wondering if we can have a more accurate accounting:
>>
>> - For the moment we add all system time since the previous entering to the
>> VCPU to the guest time (and I guess there is some real system time in
>> it ???)
>> - Perhaps we can sum nanoseconds spent in the VCPU and add it to cpustat
>> when these ns are greater than 1 ms ? (I'm trying to make something in this
>>
> way)
>
> If you look at the patch I have posted some minutes ago, I use a method
> similar to irq_enter and irq_exit to separate real system time from guest
> time.

Yes. This is orthogonal to the current accounting patch and should make
a nice extension. It's probably useful with dynamic tick where timer
interrupts can be rare.


--
error compiling committee.c: too many arguments to function

2007-08-13 16:01:44

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Christian Borntraeger wrote:
> Am Freitag, 10. August 2007 schrieb Laurent Vivier:
>> The aim of these two patches is to measure the CPU time used by a virtual
>> machine. All comments are welcome... I'm not sure it's the good way to do
> that.
>
> I did something similar for or s390guest prototype, that Carsten posted in
> May. I decided to account guest time to the user process instead of adding a
> new field to avoid hazzle with old top. As you can read in the patch comment,
> I personally prefer a new field if we can get one.
>
> My implementation uses a similar mechanism like hard and softirq. So I have an
> sie_enter an sie_exit and a task_is_in_sie function - like irq_enter and
> irq_exit. The main difference is based on the fact, that s390 has precise
> accouting for irq, steal, user and system time, and therefore my patch is
> based on architecture specifc code using CONFIG_VIRT_CPU_ACCOUNT.
>
> In general my patch has the same idea as your patch, so I am going to review
> your patch and see if it would fit for s390.
>
> For reference this is the (never posted) old patch for our virtualisation
> prototype. It wont work with kvm but it gives you the idea what we had in
> mind on s390.
>

thank you for your comment.

As virtualization becomes very popular, perhaps we should implement something
which could be used by all linux supported architectures ?
(yes, I know it's non-sense for archs like m68k...)
But my [PATCH 1/2] can be a good start (adding "guest" in cpustat)
As guest accounting is hw dependent, I think we should add a hook in the
accounting functions.

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-13 16:02:25

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Am Montag, 13. August 2007 schrieb Avi Kivity:

> Laurent's patch gives the best of both worlds: on old 'top', you get
> guest time accounted as user time, while on new 'top' it is accounted
> separately. This is done by reporting user time as the sum of the real
> user time and guest time. A newer 'top' can subtract guest time from
> user time to get the correct statistic.

Yes that looks promising. If I recall correctly we had some strange top
behaviours when we introduced the steal time. Old top added the steal time to
idle. We should check that.

>
>
> > My implementation uses a similar mechanism like hard and softirq. So I
have an
> > sie_enter an sie_exit and a task_is_in_sie function - like irq_enter and
> > irq_exit. The main difference is based on the fact, that s390 has precise
> > accouting for irq, steal, user and system time, and therefore my patch is
> > based on architecture specifc code using CONFIG_VIRT_CPU_ACCOUNT.
> >
>
> Okay, so the code should be under that config option, and kvm should
> select it.

No hurry..that was specific to our implementation, not KVM :-)
Besided that, Ingo changed the accouting with his CFS scheduler, and I still
have to figure out how CONFIG_VIRT_CPU_ACCOUNTING can be properly integrated
in CFS.

Christian

2007-08-13 16:04:11

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Am Montag, 13. August 2007 schrieb Avi Kivity:
> Christian Borntraeger wrote:
> > Am Montag, 13. August 2007 schrieb Laurent Vivier:
> >
> >>> [copying Ingo and Rusty]
> >>>
> >
> > @Avi, seems that sourceforge is mangling the cc list?
> >
> >
>
> It's not configured to do so. Can you be more specific?

I added Carsten and he was removed from the CC list in my copy I got from SF.
And you cced Ingo and Rusty but I dont have them on cc, either.

Christian

2007-08-13 16:05:06

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Laurent Vivier wrote:
> Christian Borntraeger wrote:
>
>> Am Freitag, 10. August 2007 schrieb Laurent Vivier:
>>
>>> The aim of these two patches is to measure the CPU time used by a virtual
>>> machine. All comments are welcome... I'm not sure it's the good way to do
>>>
>> that.
>>
>> I did something similar for or s390guest prototype, that Carsten posted in
>> May. I decided to account guest time to the user process instead of adding a
>> new field to avoid hazzle with old top. As you can read in the patch comment,
>> I personally prefer a new field if we can get one.
>>
>> My implementation uses a similar mechanism like hard and softirq. So I have an
>> sie_enter an sie_exit and a task_is_in_sie function - like irq_enter and
>> irq_exit. The main difference is based on the fact, that s390 has precise
>> accouting for irq, steal, user and system time, and therefore my patch is
>> based on architecture specifc code using CONFIG_VIRT_CPU_ACCOUNT.
>>
>> In general my patch has the same idea as your patch, so I am going to review
>> your patch and see if it would fit for s390.
>>
>> For reference this is the (never posted) old patch for our virtualisation
>> prototype. It wont work with kvm but it gives you the idea what we had in
>> mind on s390.
>>
>>
>
> thank you for your comment.
>
> As virtualization becomes very popular, perhaps we should implement something
> which could be used by all linux supported architectures ?
> (yes, I know it's non-sense for archs like m68k...)
> But my [PATCH 1/2] can be a good start (adding "guest" in cpustat)
> As guest accounting is hw dependent, I think we should add a hook in the
> accounting functions.
>

Isn't PF_VM exactly such a hook? All the hypervisor needs to do is to
set/unset it correctly?

--
error compiling committee.c: too many arguments to function

2007-08-13 16:06:49

by Avi Kivity

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Christian Borntraeger wrote:
> Am Montag, 13. August 2007 schrieb Avi Kivity:
>
>> Christian Borntraeger wrote:
>>
>>> Am Montag, 13. August 2007 schrieb Laurent Vivier:
>>>
>>>
>>>>> [copying Ingo and Rusty]
>>>>>
>>>>>
>>> @Avi, seems that sourceforge is mangling the cc list?
>>>
>>>
>>>
>> It's not configured to do so. Can you be more specific?
>>
>
> I added Carsten and he was removed from the CC list in my copy I got from SF.
> And you cced Ingo and Rusty but I dont have them on cc, either.
>
>

The only thing remotely relevant in the list config is that 'Filter out
duplicate messages to list members (if possible)' is set as a default
for new members. Maybe this means that if a cc is also part of the
list, that cc is stripped (which seems a wierd implementation; I'd have
expected that cc be kept and just one copy sent out).

Anybody have a clue?


--
error compiling committee.c: too many arguments to function

2007-08-13 16:07:41

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Avi Kivity wrote:
> Laurent Vivier wrote:
>> Christian Borntraeger wrote:
>>
>>> Am Freitag, 10. August 2007 schrieb Laurent Vivier:
>>>
>>>> The aim of these two patches is to measure the CPU time used by a
>>>> virtual
>>>> machine. All comments are welcome... I'm not sure it's the good way
>>>> to do
>>> that.
>>>
>>> I did something similar for or s390guest prototype, that Carsten
>>> posted in May. I decided to account guest time to the user process
>>> instead of adding a new field to avoid hazzle with old top. As you
>>> can read in the patch comment, I personally prefer a new field if we
>>> can get one.
>>>
>>> My implementation uses a similar mechanism like hard and softirq. So
>>> I have an sie_enter an sie_exit and a task_is_in_sie function - like
>>> irq_enter and irq_exit. The main difference is based on the fact,
>>> that s390 has precise accouting for irq, steal, user and system time,
>>> and therefore my patch is based on architecture specifc code using
>>> CONFIG_VIRT_CPU_ACCOUNT.
>>> In general my patch has the same idea as your patch, so I am going to
>>> review your patch and see if it would fit for s390.
>>>
>>> For reference this is the (never posted) old patch for our
>>> virtualisation prototype. It wont work with kvm but it gives you the
>>> idea what we had in mind on s390.
>>>
>>>
>>
>> thank you for your comment.
>>
>> As virtualization becomes very popular, perhaps we should implement
>> something
>> which could be used by all linux supported architectures ?
>> (yes, I know it's non-sense for archs like m68k...)
>> But my [PATCH 1/2] can be a good start (adding "guest" in cpustat)
>> As guest accounting is hw dependent, I think we should add a hook in the
>> accounting functions.
>>
>
> Isn't PF_VM exactly such a hook? All the hypervisor needs to do is to
> set/unset it correctly?

In fact, no.

PF_VM is used to know we have entered a virtual CPU (the hypervisor set it, the
scheduler unset it on accounting)

I mean a hook in account_system_time() to call a function arch-dependent to
compute the guest time (and modify the system/user time accordingly) if needed.

If fact what I find annoying in my patch is it adds to guest time and user time
some system time. Perhaps you could have a look to the second patch I sent.

Regards,
Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-13 16:28:06

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Am Montag, 13. August 2007 schrieb Laurent Vivier:
> >> As guest accounting is hw dependent, I think we should add a hook in the
> >> accounting functions.
> >>
> >
> > Isn't PF_VM exactly such a hook? All the hypervisor needs to do is to
> > set/unset it correctly?
>
> In fact, no.
>
> PF_VM is used to know we have entered a virtual CPU (the hypervisor set it,
> the scheduler unset it on accounting)

Why not do something like the following. (This patch does not work as it
relies on the no-existing var cputime_since_last_update, but it shows the
idea)

---
drivers/kvm/kvm.h | 13 +++++++++++++
drivers/kvm/svm.c | 3 ++-
drivers/kvm/vmx.c | 3 ++-
kernel/sched.c | 1 -
4 files changed, 17 insertions(+), 3 deletions(-)

Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h
+++ kvm/drivers/kvm/kvm.h
@@ -8,6 +8,7 @@

#include <linux/types.h>
#include <linux/list.h>
+#include <linux/kernel_stat.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/signal.h>
@@ -726,6 +727,18 @@ static inline u32 get_rdx_init_val(void)
return 0x600; /* P6 family */
}

+static inline guest_enter(void)
+{
+ account_system_time(current, 0, cputime_since_last_update);
+ current->flags |= PF_VCPU;
+}
+
+static inline guest_exit(void)
+{
+ current->flags &= ~PF_VCPU;
+ account_system_time(current, 0, cputime_since_last_update);
+}
+
#define ASM_VMX_VMCLEAR_RAX ".byte 0x66, 0x0f, 0xc7, 0x30"
#define ASM_VMX_VMLAUNCH ".byte 0x0f, 0x01, 0xc2"
#define ASM_VMX_VMRESUME ".byte 0x0f, 0x01, 0xc3"
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c
+++ kvm/kernel/sched.c
@@ -3251,7 +3251,6 @@ void account_system_time(struct task_str
p->utime = cputime_add(p->utime, cputime);
cpustat->user = cputime64_add(cpustat->user, tmp);
cpustat->guest = cputime64_add(cpustat->guest, tmp);
- p->flags &= ~PF_VCPU;
return;
}

Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c
+++ kvm/drivers/kvm/svm.c
@@ -1404,7 +1404,7 @@ again:
clgi();

vcpu->guest_mode = 1;
- current->flags |= PF_VCPU;
+ guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1537,6 +1537,7 @@ again:
#endif
: "cc", "memory" );

+ guest_exit();
vcpu->guest_mode = 0;

if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c
+++ kvm/drivers/kvm/vmx.c
@@ -2078,7 +2078,7 @@ again:
local_irq_disable();

vcpu->guest_mode = 1;
- current->flags |= PF_VCPU;
+ guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2199,6 +2199,7 @@ again:
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );

+ guest_exit();
vcpu->guest_mode = 0;
local_irq_enable();

2007-08-13 16:32:11

by Laurent Vivier

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

Christian Borntraeger wrote:
> Am Montag, 13. August 2007 schrieb Laurent Vivier:
>>>> As guest accounting is hw dependent, I think we should add a hook in the
>>>> accounting functions.
>>>>
>>> Isn't PF_VM exactly such a hook? All the hypervisor needs to do is to
>>> set/unset it correctly?
>> In fact, no.
>>
>> PF_VM is used to know we have entered a virtual CPU (the hypervisor set it,
>> the scheduler unset it on accounting)
>
> Why not do something like the following. (This patch does not work as it
> relies on the no-existing var cputime_since_last_update, but it shows the
> idea)

Yes, I think it is a really good idea, much more cleaner.

But doing like that you can have cpustat->system decreasing and thus negative
values in "top".

It is why I modify account_system_time() (see my last patch) to decrease the
value to add to system time accordingly the value we add in cpustat->guest, and
thus system time never decreases. We cannot do that when we call
account_system_time() from KVM part.

Laurent
--
------------- [email protected] --------------
"Software is hard" - Donald Knuth


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2007-08-13 20:40:44

by Heiko Carstens

[permalink] [raw]
Subject: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting

> The only thing remotely relevant in the list config is that 'Filter out
> duplicate messages to list members (if possible)' is set as a default for
> new members. Maybe this means that if a cc is also part of the list, that
> cc is stripped (which seems a wierd implementation; I'd have expected that
> cc be kept and just one copy sent out).
>
> Anybody have a clue?

I got also removed when I was cc'ed. IIRC that started when I subscribed to
the list. So it looks like people who are subscribed to the list get
removed from the cc list.
That's very annoying btw. ;)

2007-08-19 09:32:46

by Avi Kivity

[permalink] [raw]
Subject: List stripping out cc's (was: Re: [kvm-devel] [PATCH 0/2][KVM] guest time accounting)

Heiko Carstens wrote:
>> The only thing remotely relevant in the list config is that 'Filter out
>> duplicate messages to list members (if possible)' is set as a default for
>> new members. Maybe this means that if a cc is also part of the list, that
>> cc is stripped (which seems a wierd implementation; I'd have expected that
>> cc be kept and just one copy sent out).
>>
>> Anybody have a clue?
>>
>
> I got also removed when I was cc'ed. IIRC that started when I subscribed to
> the list. So it looks like people who are subscribed to the list get
> removed from the cc list.
> That's very annoying btw. ;)
>

I think I see what is happening. The list gets a message that you are
copied on. It sees you have opted not to receive duplicates, it strips
you from the cc list, since it knows that you will receive another copy
by direct routing (outside the list).

What it should do, is keep you on the cc list but remove you from the
smtp recipient list... that would both remove duplicates and keep you
copied on messages.

So I recommend disabling 'Filter duplicate messages' in your list
preferences and living with the duplicates. I'll change the defaults
for new subscriptions.

--
error compiling committee.c: too many arguments to function

2007-08-19 19:54:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: List stripping out cc's

Avi Kivity wrote:
> Heiko Carstens wrote:
>>> The only thing remotely relevant in the list config is that 'Filter
>>> out duplicate messages to list members (if possible)' is set as a
>>> default for new members. Maybe this means that if a cc is also part
>>> of the list, that cc is stripped (which seems a wierd
>>> implementation; I'd have expected that cc be kept and just one copy
>>> sent out).
>>>
>>> Anybody have a clue?
>>>
>>
>> I got also removed when I was cc'ed. IIRC that started when I
>> subscribed to
>> the list. So it looks like people who are subscribed to the list get
>> removed from the cc list.
>> That's very annoying btw. ;)
>>
>
> I think I see what is happening. The list gets a message that you are
> copied on. It sees you have opted not to receive duplicates, it strips
> you from the cc list, since it knows that you will receive another copy
> by direct routing (outside the list).

Well, there is also /quite/ the problem with the Mail-Followup-To
header, which causes most mailers to ignore the preferences of the
community in favor of the preferences of the message author -- which
totally screws up traditional To/CC handling on LKML.

Jeff



2007-08-19 20:11:23

by Avi Kivity

[permalink] [raw]
Subject: Re: List stripping out cc's

Jeff Garzik wrote:
> Avi Kivity wrote:
>> Heiko Carstens wrote:
>>>> The only thing remotely relevant in the list config is that
>>>> 'Filter out duplicate messages to list members (if possible)' is
>>>> set as a default for new members. Maybe this means that if a cc
>>>> is also part of the list, that cc is stripped (which seems a wierd
>>>> implementation; I'd have expected that cc be kept and just one
>>>> copy sent out).
>>>>
>>>> Anybody have a clue?
>>>>
>>>
>>> I got also removed when I was cc'ed. IIRC that started when I
>>> subscribed to
>>> the list. So it looks like people who are subscribed to the list get
>>> removed from the cc list.
>>> That's very annoying btw. ;)
>>>
>>
>> I think I see what is happening. The list gets a message that you
>> are copied on. It sees you have opted not to receive duplicates, it
>> strips you from the cc list, since it knows that you will receive
>> another copy by direct routing (outside the list).
>
> Well, there is also /quite/ the problem with the Mail-Followup-To
> header, which causes most mailers to ignore the preferences of the
> community in favor of the preferences of the message author -- which
> totally screws up traditional To/CC handling on LKML.
>

Sorry, this was about kvm-devel, not lkml -- I should have taken
linux-kernel off the cc list here to avoid increasing confusion.
Somehow that felt wrong in an email about the evils of stripping out the
cc list.

I don't think there's an issue with M-F-T: here.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.