On 05/30/2014 04:12 AM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> PEBS (Precise Event Bases Sampling) profiling is very powerful,
> allowing improved sampling precision and much additional information,
> like address or TSX abort profiling. cycles:p and :pp uses PEBS.
>
> This patch enables PEBS profiling in KVM guests.
>
> PEBS writes profiling records to a virtual address in memory. Since
> the guest controls the virtual address space the PEBS record
> is directly delivered to the guest buffer. We set up the PEBS state
> that is works correctly.The CPU cannot handle any kinds of faults during
> these guest writes.
>
> To avoid any problems with guest pages being swapped by the host we
> pin the pages when the PEBS buffer is setup, by intercepting
> that MSR.
>
> Typically profilers only set up a single page, so pinning that is not
> a big problem. The pinning is limited to 17 pages currently (64K+1)
>
> In theory the guest can change its own page tables after the PEBS
> setup. The host has no way to track that with EPT. But if a guest
> would do that it could only crash itself. It's not expected
> that normal profilers do that.
>
>
Talking a bit with Gleb about this, I think this is impossible.
First, it's not sufficient to pin the debug store area, you also have to
pin the guest page tables that are used to map the debug store. But
even if you do that, as soon as the guest fork()s, it will create a new
pgd which the host will be free to swap out. The processor can then
attempt a PEBS store to an unmapped address which will fail, even though
the guest is configured correctly.
> First, it's not sufficient to pin the debug store area, you also
> have to pin the guest page tables that are used to map the debug
> store. But even if you do that, as soon as the guest fork()s, it
> will create a new pgd which the host will be free to swap out. The
> processor can then attempt a PEBS store to an unmapped address which
> will fail, even though the guest is configured correctly.
That's a good point. You're right of course.
The only way I can think around it would be to intercept CR3 writes
while PEBS is active and always pin all the table pages leading
to the PEBS buffer. That's slow, but should be only needed
while PEBS is running.
-Andi
--
[email protected] -- Speaking for myself only.
On Sun, Jun 22, 2014 at 09:02:25PM +0200, Andi Kleen wrote:
> > First, it's not sufficient to pin the debug store area, you also
> > have to pin the guest page tables that are used to map the debug
> > store. But even if you do that, as soon as the guest fork()s, it
> > will create a new pgd which the host will be free to swap out. The
> > processor can then attempt a PEBS store to an unmapped address which
> > will fail, even though the guest is configured correctly.
>
> That's a good point. You're right of course.
>
> The only way I can think around it would be to intercept CR3 writes
> while PEBS is active and always pin all the table pages leading
> to the PEBS buffer. That's slow, but should be only needed
> while PEBS is running.
>
> -Andi
Suppose that can be done separately from the pinned spte patchset.
And it requires accounting into mlock limits as well, as noted.
One set of pagetables per pinned virtual address leading down to the
last translations is sufficient per-vcpu.
On 06/24/2014 07:45 PM, Marcelo Tosatti wrote:
> On Sun, Jun 22, 2014 at 09:02:25PM +0200, Andi Kleen wrote:
>>> First, it's not sufficient to pin the debug store area, you also
>>> have to pin the guest page tables that are used to map the debug
>>> store. But even if you do that, as soon as the guest fork()s, it
>>> will create a new pgd which the host will be free to swap out. The
>>> processor can then attempt a PEBS store to an unmapped address which
>>> will fail, even though the guest is configured correctly.
>> That's a good point. You're right of course.
>>
>> The only way I can think around it would be to intercept CR3 writes
>> while PEBS is active and always pin all the table pages leading
>> to the PEBS buffer. That's slow, but should be only needed
>> while PEBS is running.
>>
>> -Andi
> Suppose that can be done separately from the pinned spte patchset.
> And it requires accounting into mlock limits as well, as noted.
>
> One set of pagetables per pinned virtual address leading down to the
> last translations is sufficient per-vcpu.
Or 4, and use the CR3 exit filter to prevent vmexits among the last 4
LRU CR3 values.