2020-03-05 09:59:31

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

From: Kan Liang <[email protected]>

The PEBS event created by host needs to be assigned specific counters
requested by the guest, which means the guest and host counter indexes
have to be the same or fail to create. This is needed because PEBS leaks
counter indexes into the guest. Otherwise, the guest driver will be
confused by the counter indexes in the status field of the PEBS record.

A guest_dedicated_idx field is added to indicate the counter index
specifically requested by KVM. The dedicated event constraints would
constrain the counter in the host to the same numbered counter in guest.

A intel_ctrl_guest_dedicated_mask field is added to indicate the enabled
counters for guest PEBS events. The IA32_PEBS_ENABLE MSR will be switched
during the VMX transitions if intel_ctrl_guest_owned is set.

Originally-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 60 +++++++++++++++++++++++++++++++++++++++++++-
arch/x86/events/perf_event.h | 1 +
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 1 +
4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index dff6623..ef95076 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -368,6 +368,29 @@
EVENT_CONSTRAINT_END
};

+#define GUEST_DEDICATED_CONSTRAINT(idx) { \
+ { .idxmsk64 = (1ULL << (idx)) }, \
+ .weight = 1, \
+}
+
+static struct event_constraint dedicated_gp_c[MAX_PEBS_EVENTS] = {
+ GUEST_DEDICATED_CONSTRAINT(0),
+ GUEST_DEDICATED_CONSTRAINT(1),
+ GUEST_DEDICATED_CONSTRAINT(2),
+ GUEST_DEDICATED_CONSTRAINT(3),
+ GUEST_DEDICATED_CONSTRAINT(4),
+ GUEST_DEDICATED_CONSTRAINT(5),
+ GUEST_DEDICATED_CONSTRAINT(6),
+ GUEST_DEDICATED_CONSTRAINT(7),
+};
+
+static struct event_constraint dedicated_fixed_c[MAX_FIXED_PEBS_EVENTS] = {
+ GUEST_DEDICATED_CONSTRAINT(INTEL_PMC_IDX_FIXED),
+ GUEST_DEDICATED_CONSTRAINT(INTEL_PMC_IDX_FIXED + 1),
+ GUEST_DEDICATED_CONSTRAINT(INTEL_PMC_IDX_FIXED + 2),
+ GUEST_DEDICATED_CONSTRAINT(INTEL_PMC_IDX_FIXED + 3),
+};
+
static u64 intel_pmu_event_map(int hw_event)
{
return intel_perfmon_event_map[hw_event];
@@ -2158,6 +2181,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
}

cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
+ cpuc->intel_ctrl_guest_dedicated_mask &= ~(1ull << hwc->idx);
cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
cpuc->intel_cp_status &= ~(1ull << hwc->idx);

@@ -2246,6 +2270,10 @@ static void intel_pmu_enable_event(struct perf_event *event)
if (event->attr.exclude_guest)
cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);

+ if (unlikely(event->guest_dedicated_idx >= 0)) {
+ WARN_ON(hwc->idx != event->guest_dedicated_idx);
+ cpuc->intel_ctrl_guest_dedicated_mask |= (1ull << hwc->idx);
+ }
if (unlikely(event_is_checkpointed(event)))
cpuc->intel_cp_status |= (1ull << hwc->idx);

@@ -3036,7 +3064,21 @@ static void intel_commit_scheduling(struct cpu_hw_events *cpuc, int idx, int cnt
if (cpuc->excl_cntrs)
return intel_get_excl_constraints(cpuc, event, idx, c2);

- return c2;
+ if (event->guest_dedicated_idx < 0)
+ return c2;
+
+ BUILD_BUG_ON(ARRAY_SIZE(dedicated_fixed_c) != MAX_FIXED_PEBS_EVENTS);
+ if (c2->idxmsk64 & (1ULL << event->guest_dedicated_idx)) {
+ if (event->guest_dedicated_idx < MAX_PEBS_EVENTS)
+ return &dedicated_gp_c[event->guest_dedicated_idx];
+ else if ((event->guest_dedicated_idx >= INTEL_PMC_IDX_FIXED) &&
+ (event->guest_dedicated_idx < INTEL_PMC_IDX_FIXED +
+ MAX_FIXED_PEBS_EVENTS))
+ return &dedicated_fixed_c[event->guest_dedicated_idx -
+ INTEL_PMC_IDX_FIXED];
+ }
+
+ return &emptyconstraint;
}

static void intel_put_excl_constraints(struct cpu_hw_events *cpuc,
@@ -3373,6 +3415,22 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
*nr = 2;
}

+ if (cpuc->intel_ctrl_guest_dedicated_mask) {
+ arr[0].guest |= cpuc->intel_ctrl_guest_dedicated_mask;
+ arr[1].msr = MSR_IA32_PEBS_ENABLE;
+ arr[1].host = cpuc->pebs_enabled &
+ ~cpuc->intel_ctrl_guest_dedicated_mask;
+ arr[1].guest = cpuc->intel_ctrl_guest_dedicated_mask;
+ *nr = 2;
+ } else {
+ /* Remove MSR_IA32_PEBS_ENABLE from MSR switch list in KVM */
+ if (*nr == 1) {
+ arr[1].msr = MSR_IA32_PEBS_ENABLE;
+ arr[1].host = arr[1].guest = 0;
+ *nr = 2;
+ }
+ }
+
return arr;
}

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index f1cd1ca..621529c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -242,6 +242,7 @@ struct cpu_hw_events {
* Intel host/guest exclude bits
*/
u64 intel_ctrl_guest_mask;
+ u64 intel_ctrl_guest_dedicated_mask;
u64 intel_ctrl_host_mask;
struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX];

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 547773f..3bccb88 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -750,6 +750,8 @@ struct perf_event {
void *security;
#endif
struct list_head sb_list;
+ /* the guest specified counter index of KVM owned event, e.g PEBS */
+ int guest_dedicated_idx;
#endif /* CONFIG_PERF_EVENTS */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e453589..7a7b56c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10731,6 +10731,7 @@ static void account_event(struct perf_event *event)
event->id = atomic64_inc_return(&perf_event_id);

event->state = PERF_EVENT_STATE_INACTIVE;
+ event->guest_dedicated_idx = -1;

if (task) {
event->attach_state = PERF_ATTACH_TASK;
--
1.8.3.1


2020-03-06 13:54:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

On Fri, Mar 06, 2020 at 01:56:55AM +0800, Luwei Kang wrote:
> From: Kan Liang <[email protected]>
>
> The PEBS event created by host needs to be assigned specific counters
> requested by the guest, which means the guest and host counter indexes
> have to be the same or fail to create. This is needed because PEBS leaks
> counter indexes into the guest. Otherwise, the guest driver will be
> confused by the counter indexes in the status field of the PEBS record.
>
> A guest_dedicated_idx field is added to indicate the counter index
> specifically requested by KVM. The dedicated event constraints would
> constrain the counter in the host to the same numbered counter in guest.
>
> A intel_ctrl_guest_dedicated_mask field is added to indicate the enabled
> counters for guest PEBS events. The IA32_PEBS_ENABLE MSR will be switched
> during the VMX transitions if intel_ctrl_guest_owned is set.
>

> + /* the guest specified counter index of KVM owned event, e.g PEBS */
> + int guest_dedicated_idx;

We've always objected to guest 'owned' counters, they destroy scheduling
freedom. Why are you expecting that to be any different this time?

2020-03-06 14:43:58

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS



On 3/6/2020 8:53 AM, Peter Zijlstra wrote:
> On Fri, Mar 06, 2020 at 01:56:55AM +0800, Luwei Kang wrote:
>> From: Kan Liang <[email protected]>
>>
>> The PEBS event created by host needs to be assigned specific counters
>> requested by the guest, which means the guest and host counter indexes
>> have to be the same or fail to create. This is needed because PEBS leaks
>> counter indexes into the guest. Otherwise, the guest driver will be
>> confused by the counter indexes in the status field of the PEBS record.
>>
>> A guest_dedicated_idx field is added to indicate the counter index
>> specifically requested by KVM. The dedicated event constraints would
>> constrain the counter in the host to the same numbered counter in guest.
>>
>> A intel_ctrl_guest_dedicated_mask field is added to indicate the enabled
>> counters for guest PEBS events. The IA32_PEBS_ENABLE MSR will be switched
>> during the VMX transitions if intel_ctrl_guest_owned is set.
>>
>
>> + /* the guest specified counter index of KVM owned event, e.g PEBS */
>> + int guest_dedicated_idx;
>
> We've always objected to guest 'owned' counters, they destroy scheduling
> freedom. Why are you expecting that to be any different this time?
>

The new proposal tries to 'own' a counter by setting the event
constraint. It doesn't stop other events using the counter.
If there is high priority event which requires the same counter,
scheduler can still reject the request from KVM.
I don't think it destroys the scheduling freedom this time.

Thanks,
Kan

2020-03-09 10:06:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

On Fri, Mar 06, 2020 at 09:42:47AM -0500, Liang, Kan wrote:
>
>
> On 3/6/2020 8:53 AM, Peter Zijlstra wrote:
> > On Fri, Mar 06, 2020 at 01:56:55AM +0800, Luwei Kang wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > The PEBS event created by host needs to be assigned specific counters
> > > requested by the guest, which means the guest and host counter indexes
> > > have to be the same or fail to create. This is needed because PEBS leaks
> > > counter indexes into the guest. Otherwise, the guest driver will be
> > > confused by the counter indexes in the status field of the PEBS record.
> > >
> > > A guest_dedicated_idx field is added to indicate the counter index
> > > specifically requested by KVM. The dedicated event constraints would
> > > constrain the counter in the host to the same numbered counter in guest.
> > >
> > > A intel_ctrl_guest_dedicated_mask field is added to indicate the enabled
> > > counters for guest PEBS events. The IA32_PEBS_ENABLE MSR will be switched
> > > during the VMX transitions if intel_ctrl_guest_owned is set.
> > >
> >
> > > + /* the guest specified counter index of KVM owned event, e.g PEBS */
> > > + int guest_dedicated_idx;
> >
> > We've always objected to guest 'owned' counters, they destroy scheduling
> > freedom. Why are you expecting that to be any different this time?
> >
>
> The new proposal tries to 'own' a counter by setting the event constraint.
> It doesn't stop other events using the counter.
> If there is high priority event which requires the same counter, scheduler
> can still reject the request from KVM.
> I don't think it destroys the scheduling freedom this time.

Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random PEBS
event, and then the host wants to use PREC_DIST.. Then one of them will
be screwed for no reason what so ever.

How is that not destroying scheduling freedom? Any other situation we'd
have moved the !PREC_DIST PEBS event to another counter.

2020-03-09 13:13:37

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS



On 3/9/2020 6:04 AM, Peter Zijlstra wrote:
> On Fri, Mar 06, 2020 at 09:42:47AM -0500, Liang, Kan wrote:
>>
>>
>> On 3/6/2020 8:53 AM, Peter Zijlstra wrote:
>>> On Fri, Mar 06, 2020 at 01:56:55AM +0800, Luwei Kang wrote:
>>>> From: Kan Liang <[email protected]>
>>>>
>>>> The PEBS event created by host needs to be assigned specific counters
>>>> requested by the guest, which means the guest and host counter indexes
>>>> have to be the same or fail to create. This is needed because PEBS leaks
>>>> counter indexes into the guest. Otherwise, the guest driver will be
>>>> confused by the counter indexes in the status field of the PEBS record.
>>>>
>>>> A guest_dedicated_idx field is added to indicate the counter index
>>>> specifically requested by KVM. The dedicated event constraints would
>>>> constrain the counter in the host to the same numbered counter in guest.
>>>>
>>>> A intel_ctrl_guest_dedicated_mask field is added to indicate the enabled
>>>> counters for guest PEBS events. The IA32_PEBS_ENABLE MSR will be switched
>>>> during the VMX transitions if intel_ctrl_guest_owned is set.
>>>>
>>>
>>>> + /* the guest specified counter index of KVM owned event, e.g PEBS */
>>>> + int guest_dedicated_idx;
>>>
>>> We've always objected to guest 'owned' counters, they destroy scheduling
>>> freedom. Why are you expecting that to be any different this time?
>>>
>>
>> The new proposal tries to 'own' a counter by setting the event constraint.
>> It doesn't stop other events using the counter.
>> If there is high priority event which requires the same counter, scheduler
>> can still reject the request from KVM.
>> I don't think it destroys the scheduling freedom this time.
>
> Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random PEBS
> event, and then the host wants to use PREC_DIST.. Then one of them will
> be screwed for no reason what so ever.
>

The multiplexing should be triggered.

For host, if both user A and user B requires PREC_DIST, the multiplexing
should be triggered for them.
Now, the user B is KVM. I don't think there is difference. The
multiplexing should still be triggered. Why it is screwed?


> How is that not destroying scheduling freedom? Any other situation we'd
> have moved the !PREC_DIST PEBS event to another counter.
>

All counters are equivalent for them. It doesn't matter if we move it to
another counter. There is no impact for the user.

In the new proposal, KVM user is treated the same as other host events
with event constraint. The scheduler is free to choose whether or not to
assign a counter for it.

Thanks,
Kan

2020-03-09 15:07:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

On Mon, Mar 09, 2020 at 09:12:42AM -0400, Liang, Kan wrote:

> > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random PEBS
> > event, and then the host wants to use PREC_DIST.. Then one of them will
> > be screwed for no reason what so ever.
> >
>
> The multiplexing should be triggered.
>
> For host, if both user A and user B requires PREC_DIST, the multiplexing
> should be triggered for them.
> Now, the user B is KVM. I don't think there is difference. The multiplexing
> should still be triggered. Why it is screwed?

Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
different counter.

> > How is that not destroying scheduling freedom? Any other situation we'd
> > have moved the !PREC_DIST PEBS event to another counter.
> >
>
> All counters are equivalent for them. It doesn't matter if we move it to
> another counter. There is no impact for the user.

But we cannot move it to another counter, because you're pinning it.

> In the new proposal, KVM user is treated the same as other host events with
> event constraint. The scheduler is free to choose whether or not to assign a
> counter for it.

That's what it does, I understand that. I'm saying that that is creating
artificial contention.


Why is this needed anyway? Can't we force the guest to flush and then
move it over to a new counter?

2020-03-09 15:45:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

> Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random PEBS
> event, and then the host wants to use PREC_DIST.. Then one of them will
> be screwed for no reason what so ever.

It's no different from some user using an event that requires some
specific counter.

>
> How is that not destroying scheduling freedom? Any other situation we'd
> have moved the !PREC_DIST PEBS event to another counter.

Anyways what are you suggesting to do instead? Do you have a better proposal?

The only alternative I know to doing this would be to go through the PEBS
buffer in the guest and patch the applicable counter field up on each PMI.

I tried that at some point (still have code somewhere), but it was
quite complicated and tricky and somewhat slow, so I gave up eventually.

It's also inherently racy because if the guest starts looking at
the PEBS buffer before an PMI it could see the unpatched values
Given I don't know any real software which would break from this,
but such "polled PEBS" usages are certainly concievable.

The artificial constraint is a lot simpler and straight forward,
and also doesn't have any holes like this.

-Andi

2020-03-09 19:29:36

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS



On 3/9/2020 11:05 AM, Peter Zijlstra wrote:
>> In the new proposal, KVM user is treated the same as other host events with
>> event constraint. The scheduler is free to choose whether or not to assign a
>> counter for it.
> That's what it does, I understand that. I'm saying that that is creating
> artificial contention.
>
>
> Why is this needed anyway? Can't we force the guest to flush and then
> move it over to a new counter?

KVM only traps the MSR access. There is no MSR access during the
scheduling in guest.
KVM/host only knows the request counter, when guest tries to enable the
counter. It's too late for guest to start over.

Regarding to the artificial contention, as my understanding, it should
rarely happen in practical.
Cloud vendors have to explicitly set pebs option in qemu to enable PEBS
support for guest. They knows the environment well. They can avoid the
contention. (We may implement some patches for qemu/KVM later to
temporarily disable PEBS in runtime if they require.)

For now, I think we may print a warning when both host and guest require
the same counter. Host can get a clue from the warning.

Thanks,
Kan

2020-03-12 10:29:29

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

> >> In the new proposal, KVM user is treated the same as other host
> >> events with event constraint. The scheduler is free to choose whether
> >> or not to assign a counter for it.
> > That's what it does, I understand that. I'm saying that that is
> > creating artificial contention.
> >
> >
> > Why is this needed anyway? Can't we force the guest to flush and then
> > move it over to a new counter?
>
> KVM only traps the MSR access. There is no MSR access during the scheduling
> in guest.
> KVM/host only knows the request counter, when guest tries to enable the
> counter. It's too late for guest to start over.
>
> Regarding to the artificial contention, as my understanding, it should rarely
> happen in practical.
> Cloud vendors have to explicitly set pebs option in qemu to enable PEBS
> support for guest. They knows the environment well. They can avoid the
> contention. (We may implement some patches for qemu/KVM later to
> temporarily disable PEBS in runtime if they require.)
>
> For now, I think we may print a warning when both host and guest require the
> same counter. Host can get a clue from the warning.

Hi Peter,
What is your opinion? We can treat the guest PEBS event as an event from the user. A little different from other events is the need of pin to the specific counter because the counter index will be included in the Guest PEBS record. The guest counter can't be forced disabled/released as well, otherwise, the end-user will complain. Can we add a warning when the contention is detected or add some description in the document or...?

Thanks,
Luwei Kang

2020-03-26 14:05:15

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

Hi Peter,

On 3/9/2020 3:28 PM, Liang, Kan wrote:
>
>
> On 3/9/2020 11:05 AM, Peter Zijlstra wrote:
>>> In the new proposal, KVM user is treated the same as other host
>>> events with
>>> event constraint. The scheduler is free to choose whether or not to
>>> assign a
>>> counter for it.
>> That's what it does, I understand that. I'm saying that that is creating
>> artificial contention.
>>
>>
>> Why is this needed anyway? Can't we force the guest to flush and then
>> move it over to a new counter?
>

Current perf scheduling is pure software behavior. KVM only traps the
MSR access. It’s impossible for KVM to impact the guest’s scheduling
with current implementation.

To address the concern regarding to 'artificial contention', we have two
proposals.
Could you please take a look, and share your thoughts?

Proposal 1:
Reject the guest request, if host has to use the counter which occupied
by guest. At the meantime, host prints a warning.
I still think the contention should rarely happen in practical.
Personally, I prefer this proposal.


Proposal 2:
Add HW advisor for the scheduler in guest.
Starts from Architectural Perfmon Version 4, IA32_PERF_GLOBAL_INUSE MSR
is introduced. It provides an “InUse” bit for each programmable
performance counter and fixed counter in the processor.

In perf, the scheduler will read the MSR and mask the “in used”
counters. I think we can use X86_FEATURE_HYPERVISOR to limit the check
in guest. For non-virtualization usage and host, nothing changed for
scheduler.

But there is still a problem for this proposal. Host may request a
counter later, which has been used by guest.
We can only do multiplexing or grab the counter just like proposal 1.


What do you think?

Thanks,
Kan

> KVM only traps the MSR access. There is no MSR access during the
> scheduling in guest.
> KVM/host only knows the request counter, when guest tries to enable the
> counter. It's too late for guest to start over.
>
> Regarding to the artificial contention, as my understanding, it should
> rarely happen in practical.
> Cloud vendors have to explicitly set pebs option in qemu to enable PEBS
> support for guest. They knows the environment well. They can avoid the
> contention. (We may implement some patches for qemu/KVM later to
> temporarily disable PEBS in runtime if they require.)
>
> For now, I think we may print a warning when both host and guest require
> the same counter. Host can get a clue from the warning.
>
> Thanks,
> Kan

2020-04-07 12:36:26

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

> > On 3/9/2020 11:05 AM, Peter Zijlstra wrote:
> >>> In the new proposal, KVM user is treated the same as other host
> >>> events with event constraint. The scheduler is free to choose
> >>> whether or not to assign a counter for it.
> >> That's what it does, I understand that. I'm saying that that is
> >> creating artificial contention.
> >>
> >>
> >> Why is this needed anyway? Can't we force the guest to flush and then
> >> move it over to a new counter?
> >
>
> Current perf scheduling is pure software behavior. KVM only traps the MSR
> access. It’s impossible for KVM to impact the guest’s scheduling with current
> implementation.
>
> To address the concern regarding to 'artificial contention', we have two
> proposals.
> Could you please take a look, and share your thoughts?
>
> Proposal 1:
> Reject the guest request, if host has to use the counter which occupied by
> guest. At the meantime, host prints a warning.
> I still think the contention should rarely happen in practical.
> Personally, I prefer this proposal.
>
>
> Proposal 2:
> Add HW advisor for the scheduler in guest.
> Starts from Architectural Perfmon Version 4, IA32_PERF_GLOBAL_INUSE MSR
> is introduced. It provides an “InUse” bit for each programmable
> performance counter and fixed counter in the processor.
>
> In perf, the scheduler will read the MSR and mask the “in used”
> counters. I think we can use X86_FEATURE_HYPERVISOR to limit the check
> in guest. For non-virtualization usage and host, nothing changed for
> scheduler.
>
> But there is still a problem for this proposal. Host may request a
> counter later, which has been used by guest.
> We can only do multiplexing or grab the counter just like proposal 1.

Hi Peter,
What is your opinion?

Thanks,
Luwei Kang

2020-06-12 05:31:44

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

> > > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random
> > > PEBS event, and then the host wants to use PREC_DIST.. Then one of
> > > them will be screwed for no reason what so ever.
> > >
> >
> > The multiplexing should be triggered.
> >
> > For host, if both user A and user B requires PREC_DIST, the
> > multiplexing should be triggered for them.
> > Now, the user B is KVM. I don't think there is difference. The
> > multiplexing should still be triggered. Why it is screwed?
>
> Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
> different counter.
>
> > > How is that not destroying scheduling freedom? Any other situation
> > > we'd have moved the !PREC_DIST PEBS event to another counter.
> > >
> >
> > All counters are equivalent for them. It doesn't matter if we move it
> > to another counter. There is no impact for the user.
>
> But we cannot move it to another counter, because you're pinning it.

Hi Peter,

To avoid the pinning counters, I have tried to do some evaluation about
patching the PEBS record for guest in KVM. In this approach, about ~30%
time increased on guest PEBS PMI handler latency (
e.g.perf record -e branch-loads:p -c 1000 ~/Tools/br_instr a).

Some implementation details as below:
1. Patching the guest PEBS records "Applicable Counters" filed when the guest
required counter is not the same with the host. Because the guest PEBS
driver will drop these PEBS records if the "Applicable Counters" not the
same with the required counter index.
2. Traping the guest driver's behavior(VM-exit) of disabling PEBS.
It happens before reading PEBS records (e.g. PEBS PMI handler, before
application exit and so on)
3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
guest PEBS buffer.
<1> Trapping the guest write of IA32_DS_AREA register and get the GVA
of guest DS_AREA.
<2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
and get the GVA of guest PEBS buffer from DS AREA
(kvm_vcpu_read_guest_atomic).
<3> Although we have got the GVA of PEBS buffer, we need to do the
address translation(GVA->GPA->HPA) for each page. Because we can't
assume the GPAs of Guest PEBS buffer are always continuous.

But we met another issue about the PEBS counter reset field in DS AREA.
pebs_event_reset in DS area has to be set for auto reload, which is per
counter. Guest and Host may use different counters. Let's say guest wants to
use counter 0, but host assign counter 1 to guest. Guest sets the reset value to
pebs_event_reset[0]. However, since counter 1 is the one which is eventually
scheduled, HW will use pebs_event_reset[1] as reset value.

We can't copy the value of the guest pebs_event_reset[0] to
pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
confused, and we can't assume the guest counter 0 and 1 are not used for this
PEBS task at the same time. And what's more, KVM can't aware the guest
read/write to the DS AREA because it just a general memory for guest.

What is your opinion or do you have a better proposal?

Thanks,
Luwei Kang

>
> > In the new proposal, KVM user is treated the same as other host events
> > with event constraint. The scheduler is free to choose whether or not
> > to assign a counter for it.
>
> That's what it does, I understand that. I'm saying that that is creating artificial
> contention.
>
>
> Why is this needed anyway? Can't we force the guest to flush and then move it
> over to a new counter?

2020-06-19 11:33:34

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

> > > > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some
> > > > random PEBS event, and then the host wants to use PREC_DIST.. Then
> > > > one of them will be screwed for no reason what so ever.
> > > >
> > >
> > > The multiplexing should be triggered.
> > >
> > > For host, if both user A and user B requires PREC_DIST, the
> > > multiplexing should be triggered for them.
> > > Now, the user B is KVM. I don't think there is difference. The
> > > multiplexing should still be triggered. Why it is screwed?
> >
> > Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
> > different counter.
> >
> > > > How is that not destroying scheduling freedom? Any other situation
> > > > we'd have moved the !PREC_DIST PEBS event to another counter.
> > > >
> > >
> > > All counters are equivalent for them. It doesn't matter if we move
> > > it to another counter. There is no impact for the user.
> >
> > But we cannot move it to another counter, because you're pinning it.
>
> Hi Peter,
>
> To avoid the pinning counters, I have tried to do some evaluation about
> patching the PEBS record for guest in KVM. In this approach, about ~30% time
> increased on guest PEBS PMI handler latency ( e.g.perf record -e branch-
> loads:p -c 1000 ~/Tools/br_instr a).
>
> Some implementation details as below:
> 1. Patching the guest PEBS records "Applicable Counters" filed when the guest
> required counter is not the same with the host. Because the guest PEBS
> driver will drop these PEBS records if the "Applicable Counters" not the
> same with the required counter index.
> 2. Traping the guest driver's behavior(VM-exit) of disabling PEBS.
> It happens before reading PEBS records (e.g. PEBS PMI handler, before
> application exit and so on)
> 3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
> guest PEBS buffer.
> <1> Trapping the guest write of IA32_DS_AREA register and get the GVA
> of guest DS_AREA.
> <2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
> and get the GVA of guest PEBS buffer from DS AREA
> (kvm_vcpu_read_guest_atomic).
> <3> Although we have got the GVA of PEBS buffer, we need to do the
> address translation(GVA->GPA->HPA) for each page. Because we can't
> assume the GPAs of Guest PEBS buffer are always continuous.
>
> But we met another issue about the PEBS counter reset field in DS AREA.
> pebs_event_reset in DS area has to be set for auto reload, which is per counter.
> Guest and Host may use different counters. Let's say guest wants to use
> counter 0, but host assign counter 1 to guest. Guest sets the reset value to
> pebs_event_reset[0]. However, since counter 1 is the one which is eventually
> scheduled, HW will use pebs_event_reset[1] as reset value.
>
> We can't copy the value of the guest pebs_event_reset[0] to
> pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
> confused, and we can't assume the guest counter 0 and 1 are not used for this
> PEBS task at the same time. And what's more, KVM can't aware the guest
> read/write to the DS AREA because it just a general memory for guest.
>
> What is your opinion or do you have a better proposal?

Kindly ping~

Thanks,
Luwei Kang

>
> Thanks,
> Luwei Kang
>
> >
> > > In the new proposal, KVM user is treated the same as other host
> > > events with event constraint. The scheduler is free to choose
> > > whether or not to assign a counter for it.
> >
> > That's what it does, I understand that. I'm saying that that is
> > creating artificial contention.
> >
> >
> > Why is this needed anyway? Can't we force the guest to flush and then
> > move it over to a new counter?

2020-08-20 03:33:47

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated counter for guest PEBS

Hi Peter,

On 2020/6/12 13:28, Kang, Luwei wrote:
>>>> Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random
>>>> PEBS event, and then the host wants to use PREC_DIST.. Then one of
>>>> them will be screwed for no reason what so ever.
>>>>
>>>
>>> The multiplexing should be triggered.
>>>
>>> For host, if both user A and user B requires PREC_DIST, the
>>> multiplexing should be triggered for them.
>>> Now, the user B is KVM. I don't think there is difference. The
>>> multiplexing should still be triggered. Why it is screwed?
>>
>> Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
>> different counter.
>>
>>>> How is that not destroying scheduling freedom? Any other situation
>>>> we'd have moved the !PREC_DIST PEBS event to another counter.
>>>>
>>>
>>> All counters are equivalent for them. It doesn't matter if we move it
>>> to another counter. There is no impact for the user.
>>
>> But we cannot move it to another counter, because you're pinning it.
>
> Hi Peter,
>
> To avoid the pinning counters, I have tried to do some evaluation about
> patching the PEBS record for guest in KVM. In this approach, about ~30%
> time increased on guest PEBS PMI handler latency (
> e.g.perf record -e branch-loads:p -c 1000 ~/Tools/br_instr a).
>
> Some implementation details as below:
> 1. Patching the guest PEBS records "Applicable Counters" filed when the guest
> required counter is not the same with the host. Because the guest PEBS
> driver will drop these PEBS records if the "Applicable Counters" not the
> same with the required counter index.
> 2. Traping the guest driver's behavior(VM-exit) of disabling PEBS.
> It happens before reading PEBS records (e.g. PEBS PMI handler, before
> application exit and so on)
> 3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
> guest PEBS buffer.
> <1> Trapping the guest write of IA32_DS_AREA register and get the GVA
> of guest DS_AREA.
> <2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
> and get the GVA of guest PEBS buffer from DS AREA
> (kvm_vcpu_read_guest_atomic).
> <3> Although we have got the GVA of PEBS buffer, we need to do the
> address translation(GVA->GPA->HPA) for each page. Because we can't
> assume the GPAs of Guest PEBS buffer are always continuous.
>
> But we met another issue about the PEBS counter reset field in DS AREA.
> pebs_event_reset in DS area has to be set for auto reload, which is per
> counter. Guest and Host may use different counters. Let's say guest wants to
> use counter 0, but host assign counter 1 to guest. Guest sets the reset value to
> pebs_event_reset[0]. However, since counter 1 is the one which is eventually
> scheduled, HW will use pebs_event_reset[1] as reset value.
>
> We can't copy the value of the guest pebs_event_reset[0] to
> pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
> confused, and we can't assume the guest counter 0 and 1 are not used for this
> PEBS task at the same time. And what's more, KVM can't aware the guest
> read/write to the DS AREA because it just a general memory for guest.
>
> What is your opinion or do you have a better proposal?

Do we have any update or clear attitude
on this "patching the PEBS record for guest in KVM" proposal ?

Thanks,
Like Xu

>
> Thanks,
> Luwei Kang
>
>>
>>> In the new proposal, KVM user is treated the same as other host events
>>> with event constraint. The scheduler is free to choose whether or not
>>> to assign a counter for it.
>>
>> That's what it does, I understand that. I'm saying that that is creating artificial
>> contention.
>>
>>
>> Why is this needed anyway? Can't we force the guest to flush and then move it
>> over to a new counter?