2022-04-12 23:17:48

by Wei Zhang

[permalink] [raw]
Subject: [PATCH 1/2] KVM: x86: allow guest to send its _stext for kvm profiling

The profiling buffer is indexed by (pc - _stext) in do_profile_hits(),
which doesn't work for KVM profiling because the pc represents an address
in the guest kernel. readprofile is broken in this case, unless the guest
kernel happens to have the same _stext as the host kernel.

This patch adds a new hypercall so guests could send its _stext to the
host, which will then be used to adjust the calculation for KVM profiling.

Signed-off-by: Wei Zhang <[email protected]>
---
arch/x86/kvm/x86.c | 15 +++++++++++++++
include/linux/kvm_host.h | 4 ++++
include/uapi/linux/kvm_para.h | 1 +
virt/kvm/Kconfig | 5 +++++
4 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 547ba00ef64f..abeacdd5d362 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9246,6 +9246,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
return 0;
}
+#ifdef CONFIG_ACCURATE_KVM_PROFILING
+ case KVM_HC_GUEST_STEXT:
+ vcpu->kvm->guest_stext = a0;
+ ret = 0;
+ break;
+#endif
default:
ret = -KVM_ENOSYS;
break;
@@ -10261,6 +10267,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
*/
if (unlikely(prof_on == KVM_PROFILING)) {
unsigned long rip = kvm_rip_read(vcpu);
+#ifdef CONFIG_ACCURATE_KVM_PROFILING
+ /*
+ * Profiling buffer is indexed by (rip - _stext), but it's
+ * supposed to be indexed by (rip - guest_stext) instead.
+ * Therefore apply an offest in advance to get correct results.
+ */
+ if (vcpu->kvm->guest_stext)
+ rip += (unsigned long)_stext - vcpu->kvm->guest_stext;
+#endif
profile_hit(KVM_PROFILING, (void *)rip);
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3f9b22c4983a..65caaa4d87c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -781,6 +781,10 @@ struct kvm {
struct notifier_block pm_notifier;
#endif
char stats_id[KVM_STATS_NAME_SIZE];
+
+#ifdef CONFIG_ACCURATE_KVM_PROFILING
+ unsigned long guest_stext;
+#endif
};

#define kvm_err(fmt, ...) \
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 960c7e93d1a9..dcb4ba1f033c 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -30,6 +30,7 @@
#define KVM_HC_SEND_IPI 10
#define KVM_HC_SCHED_YIELD 11
#define KVM_HC_MAP_GPA_RANGE 12
+#define KVM_HC_GUEST_STEXT 13

/*
* hypercalls use architecture specific
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index a8c5c9f06b3c..8798f75ddade 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -72,3 +72,8 @@ config KVM_XFER_TO_GUEST_WORK

config HAVE_KVM_PM_NOTIFIER
bool
+
+# Offer an additional hypercall to a guest so it could pass value of _stext to
+# host, which will be used to adjust the calculation of KVM profiling.
+config ACCURATE_KVM_PROFILING
+ bool
--
2.35.1.1178.g4f1659d476-goog


2022-05-10 03:48:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: allow guest to send its _stext for kvm profiling

On Tue, Apr 12, 2022, Wei Zhang wrote:
> The profiling buffer is indexed by (pc - _stext) in do_profile_hits(),
> which doesn't work for KVM profiling because the pc represents an address
> in the guest kernel. readprofile is broken in this case, unless the guest
> kernel happens to have the same _stext as the host kernel.
>
> This patch adds a new hypercall so guests could send its _stext to the
> host, which will then be used to adjust the calculation for KVM profiling.

Disclaimer, I know nothing about using profiling.

Why not just omit the _stext adjustment and profile the raw guest RIP? It seems
like userspace needs to know about the guest layout in order to make use of profling
info, so why not report raw info and let host userspace do all adjustments?

> Signed-off-by: Wei Zhang <[email protected]>
> ---
> arch/x86/kvm/x86.c | 15 +++++++++++++++
> include/linux/kvm_host.h | 4 ++++
> include/uapi/linux/kvm_para.h | 1 +
> virt/kvm/Kconfig | 5 +++++
> 4 files changed, 25 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 547ba00ef64f..abeacdd5d362 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9246,6 +9246,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> return 0;
> }
> +#ifdef CONFIG_ACCURATE_KVM_PROFILING
> + case KVM_HC_GUEST_STEXT:
> + vcpu->kvm->guest_stext = a0;

Rather than snapshot the guest offset, snapshot the delta. E.g.

vcpu->kvm->arch.guest_stext_offset = (unsigned long)_stext - a0;

Then the profiling flow can just be

unsigned long rip;

rip = kvm_rip_read(vcpu) + vcpu->kvm->arch.guest_text_offset;
profile_hit(KVM_PROFILING, (void *)rip);


> + ret = 0;
> + break;
> +#endif
> default:
> ret = -KVM_ENOSYS;
> break;
> @@ -10261,6 +10267,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> */
> if (unlikely(prof_on == KVM_PROFILING)) {
> unsigned long rip = kvm_rip_read(vcpu);
> +#ifdef CONFIG_ACCURATE_KVM_PROFILING

A Kconfig, and really any #define, is completely unnecessary. This is all x86
code, just throw the offest into struct kvm_arch.

2022-05-11 19:48:05

by Wei Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: allow guest to send its _stext for kvm profiling

On Tue, May 10, 2022 at 1:55 AM Sean Christopherson <[email protected]> wrote:
>
> On Tue, Apr 12, 2022, Wei Zhang wrote:
> > The profiling buffer is indexed by (pc - _stext) in do_profile_hits(),
> > which doesn't work for KVM profiling because the pc represents an address
> > in the guest kernel. readprofile is broken in this case, unless the guest
> > kernel happens to have the same _stext as the host kernel.
> >
> > This patch adds a new hypercall so guests could send its _stext to the
> > host, which will then be used to adjust the calculation for KVM profiling.
>
> Disclaimer, I know nothing about using profiling.
>
> Why not just omit the _stext adjustment and profile the raw guest RIP? It seems
> like userspace needs to know about the guest layout in order to make use of profling
> info, so why not report raw info and let host userspace do all adjustments?

It's hard to store raw IPs if we want to reuse the existing profiling
facility. The profiling function is initially used to store the
current IP at each clock tick for the host kernel.

The original design avoided the trouble of storing raw IPs by creating
a buffer array with a length of (_etext - _stext) and do buffer[IP -
_stext]++ at each clock tick. In the user space, the readprofile
command could read it from /proc/profile and tell us roughly how many
ticks occurred in each kernel function with a map file. (IP - _stext)
has a clear meaning here since it gives us an offset with respect to
the start of the text segment. This gets tricky after the profile=kvm
boot option was introduced
(https://github.com/torvalds/linux/commit/07031e14) because (IP -
_stext) is no longer meaningful.

I think raw guest IPs are easy to consume by userspace tools. But we
probably need to go with a different approach if we want to store raw
guest IPs.