2022-04-12 23:18:26

by Wei Zhang

[permalink] [raw]
Subject: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

The profile=kvm boot option has been useful because it provides a
convenient approach to profile VM exits. However, it's problematic because
the profiling buffer is indexed by (pc - _stext), and a guest's pc minus a
host's _stext doesn't make sense in most cases.

When running another linux kernel in the guest, we could work around the
problem by disabling KASLR in both the host and the guest so they have the
same _stext. However, this is inconvenient and not always possible.

We're looking for a solution to this problem. A straightforward idea is to
pass the guest's _stext to the host so the profiling buffer can be indexed
correctly. This approach is quite brute, as you can see in the prototype
patches.

We had some initial discussions and here is a short summary:
1. The VM-exit profiling is already hacky. It's collecting stats about all
KVM guests bunched together into a single global buffer without any
separation.
2. Even if we pass _stext from the guest, there are still a lot of
limitations: There can be only one running guest, and the size of its
text region shouldn't exceed the size of the profiling buffer,
which is (_etext - _stext) in the host.
3. There are other methods for profiling VM exits, but it would be really
convenient if readprofile just works out of box for KVM profiling.

It would be awesome to hear more thoughts on this. Should we try to fix the
existing VM-exit profiling functionility? Or should we avoid adding more
hacks there? If it should be fixed, what's the preferred way? Thanks in
advance for any suggestions.

Wei Zhang (2):
KVM: x86: allow guest to send its _stext for kvm profiling
KVM: x86: illustrative example for sending guest _stext with a
hypercall

arch/x86/kernel/setup.c | 6 ++++++
arch/x86/kvm/x86.c | 15 +++++++++++++++
include/linux/kvm_host.h | 4 ++++
include/uapi/linux/kvm_para.h | 1 +
virt/kvm/Kconfig | 5 +++++
5 files changed, 31 insertions(+)

base-commit: 42dcbe7d8bac997eef4c379e61d9121a15ed4e36
--
2.35.1.1178.g4f1659d476-goog


2022-05-10 03:48:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

On Tue, Apr 12, 2022, Wei Zhang wrote:
> The profile=kvm boot option has been useful because it provides a
> convenient approach to profile VM exits.

What exactly are you profiling? Where the guest executing at any given exit? Mostly
out of curiosity, but also in the hope that we might be able to replace profiling with
a dedicated KVM stat(s).

2022-05-12 12:02:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

+Jing and David

On Wed, May 11, 2022, Wei Zhang wrote:

Please don't top-post. From https://people.kernel.org/tglx/notes-about-netiquette:

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?


> On Tue, May 10, 2022 at 1:57 AM Sean Christopherson <[email protected]> wrote:
> >
> > On Tue, Apr 12, 2022, Wei Zhang wrote:
> > > The profile=kvm boot option has been useful because it provides a
> > > convenient approach to profile VM exits.
> >
> > What exactly are you profiling? Where the guest executing at any given exit? Mostly
> > out of curiosity, but also in the hope that we might be able to replace profiling with
> > a dedicated KVM stat(s).
>
> Yes, the profiling is about finding out which instructions in the
> guest trigger VM exits and the corresponding frequencies.

Do you actually what to profile which instructions _trigger_ exits? Because that's
not what this does. This profiles every exit, regardless of whether or not the exit
was due to a guest action. E.g. host IRQs/NMIs, page faults, etc... will all get
included and pollute the profile. Over time, the signal-to-noise ratio will likely
improve, but there's definitely still going to be noise.

We actually tried to upstream histograms for exit reasons[*] (link is for arm64,
but we want it for x86 too, just can't find a link), but it was deemed too expensive
in terms of memory cost for general use.

An idea that's on our (GCP folks) todo list is to explore adding an eBPF hook into
the exit path that would allow userspace to inspect e.g. struct kvm_run on VM-Exit.
That would allow userspace to collect all kinds of info about VM-Exits without
committing to ABI beyond kvm_run, and without bloating the size of a vCPU for
environments that don't want detailed histograms/profiling.

My preference would be to find a more complete, KVM-specific solution. The
profiling stuff seems like it's a dead end, i.e. will always be flawed in some
way. If this cleanup didn't require a new hypercall then I wouldn't care, but
I don't love having to extend KVM's guest/host ABI for something that ideally
will become obsolete sooner than later.

[*] https://lore.kernel.org/all/[email protected]

2022-05-12 16:32:03

by Wei Zhang

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

Yes, the profiling is about finding out which instructions in the
guest trigger VM exits and the corresponding frequencies.

Basically this will give a histogram array in /proc/profile. So if
'array[A] == T', we know that the instruction at (_stext + A) triggers
VM exits T times. readprofile command could read the information and
show a summary.




On Tue, May 10, 2022 at 1:57 AM Sean Christopherson <[email protected]> wrote:
>
> On Tue, Apr 12, 2022, Wei Zhang wrote:
> > The profile=kvm boot option has been useful because it provides a
> > convenient approach to profile VM exits.
>
> What exactly are you profiling? Where the guest executing at any given exit? Mostly
> out of curiosity, but also in the hope that we might be able to replace profiling with
> a dedicated KVM stat(s).

2022-05-17 01:55:33

by Wei Zhang

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

> Please don't top-post. From https://people.kernel.org/tglx/notes-about-netiquette:

Ah, I didn't know this should be avoided. Thanks for the info!

> My preference would be to find a more complete, KVM-specific solution. The
> profiling stuff seems like it's a dead end, i.e. will always be flawed in some
> way. If this cleanup didn't require a new hypercall then I wouldn't care, but
> I don't love having to extend KVM's guest/host ABI for something that ideally
> will become obsolete sooner than later.

I also feel that adding a new hypercall is too much here. A
KVM-specific solution is definitely better, and the eBPF based
approach you mentioned sounds like the ultimate solution (at least for
inspecting exit reasons).

+Suleiman What do you think? The on-going work Sean described sounds
promising, perhaps we should put this patch aside for the time being.

2022-05-18 05:12:49

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

On Tue, May 17, 2022 at 4:30 AM Wei Zhang <[email protected]> wrote:
>
> > Please don't top-post. From https://people.kernel.org/tglx/notes-about-netiquette:
>
> Ah, I didn't know this should be avoided. Thanks for the info!
>
> > My preference would be to find a more complete, KVM-specific solution. The
> > profiling stuff seems like it's a dead end, i.e. will always be flawed in some
> > way. If this cleanup didn't require a new hypercall then I wouldn't care, but
> > I don't love having to extend KVM's guest/host ABI for something that ideally
> > will become obsolete sooner than later.
>
> I also feel that adding a new hypercall is too much here. A
> KVM-specific solution is definitely better, and the eBPF based
> approach you mentioned sounds like the ultimate solution (at least for
> inspecting exit reasons).
>
> +Suleiman What do you think? The on-going work Sean described sounds
> promising, perhaps we should put this patch aside for the time being.

I'm ok with that.
That said, the advantage of the current solution is that it already
exists and is very easy to use, by anyone, without having to write any
code. The proposed solution doesn't sound like it will be as easy.

Regarding the earlier question about wanting to know which
instructions trigger exits, most times I've needed to get exit
profiles, I actually wanted to know where the guest was at the time of
the exit, regardless of who triggered the exit.

-- Suleiman

2022-05-18 15:36:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: x86: Fix incorrect VM-exit profiling

On Wed, May 18, 2022, Suleiman Souhlal wrote:
> On Tue, May 17, 2022 at 4:30 AM Wei Zhang <[email protected]> wrote:
> >
> > > Please don't top-post. From https://people.kernel.org/tglx/notes-about-netiquette:
> >
> > Ah, I didn't know this should be avoided. Thanks for the info!
> >
> > > My preference would be to find a more complete, KVM-specific solution. The
> > > profiling stuff seems like it's a dead end, i.e. will always be flawed in some
> > > way. If this cleanup didn't require a new hypercall then I wouldn't care, but
> > > I don't love having to extend KVM's guest/host ABI for something that ideally
> > > will become obsolete sooner than later.
> >
> > I also feel that adding a new hypercall is too much here. A
> > KVM-specific solution is definitely better, and the eBPF based
> > approach you mentioned sounds like the ultimate solution (at least for
> > inspecting exit reasons).
> >
> > +Suleiman What do you think? The on-going work Sean described sounds
> > promising, perhaps we should put this patch aside for the time being.
>
> I'm ok with that.
> That said, the advantage of the current solution is that it already
> exists and is very easy to use, by anyone, without having to write any
> code. The proposed solution doesn't sound like it will be as easy.

My goal/hope is to make the eBPF approach just as easy by providing/building a
library of KVM eBPF programs in tools/ so that doing common things like profiling
VM-Exits doesn't require reinventing the wheel. And those programs could be used
(and thus implicitly tested) by KVM selftests to verify the kernel functionality.