2019-07-11 01:26:31

by Eric Hankland

[permalink] [raw]
Subject: [PATCH v2] KVM: x86: PMU Event Filter

- Add a VM ioctl that can control which events the guest can monitor.

Signed-off-by: ehankland <[email protected]>
---
Changes since v1:
-Moved to a vm ioctl rather than a vcpu one
-Changed from a whitelist to a configurable filter which can either be
white or black
-Only restrict GP counters since fixed counters require extra handling
and they can be disabled by setting the guest cpuid (though only by
setting the number - they can't be disabled individually)
---
Documentation/virtual/kvm/api.txt | 25 +++++++++++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/uapi/asm/kvm.h | 10 +++++
arch/x86/kvm/pmu.c | 61 +++++++++++++++++++++++++++++++
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/x86.c | 6 +++
include/uapi/linux/kvm.h | 3 ++
7 files changed, 108 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index 91fd86fcc49f..a9ee8da36595 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4065,6 +4065,31 @@ KVM_ARM_VCPU_FINALIZE call.
See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization
using this ioctl.

+4.120 KVM_SET_PMU_EVENT_FILTER
+
+Capability: KVM_CAP_PMU_EVENT_FILTER
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_pmu_event_filter (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_pmu_event_filter {
+ __u32 type;
+ __u32 nevents;
+ __u64 events[0];
+};
+
+This ioctl restricts the set of PMU events that the guest can program to either
+a whitelist or a blacklist of events. The eventsel+umask of each event the
+guest attempts to program is compared against the events field to determine
+whether the guest should have access. This only affects general purpose
+counters; fixed purpose counters can be disabled by changing the perfmon
+CPUID leaf.
+
+Valid values for 'type':
+#define KVM_PMU_EVENT_WHITELIST 0
+#define KVM_PMU_EVENT_BLACKLIST 1
+

5. The kvm_run structure
------------------------

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f46a12a5cf2e..34d017bd1d1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -933,6 +933,8 @@ struct kvm_arch {

bool guest_can_read_msr_platform_info;
bool exception_payload_enabled;
+
+ struct kvm_pmu_event_filter *pmu_event_filter;
};

struct kvm_vm_stat {
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index f9b021e16ebc..4d2e905b7d79 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -422,4 +422,14 @@ struct kvm_nested_state {
__u8 data[0];
};

+/* for KVM_CAP_PMU_EVENT_FILTER */
+struct kvm_pmu_event_filter {
+ __u32 type;
+ __u32 nevents;
+ __u64 events[0];
+};
+
+#define KVM_PMU_EVENT_WHITELIST 0
+#define KVM_PMU_EVENT_BLACKLIST 1
+
#endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index dd745b58ffd8..d674b79ff8da 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -22,6 +22,9 @@
#include "lapic.h"
#include "pmu.h"

+/* This keeps the total size of the filter under 4k. */
+#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 63
+
/* NOTE:
* - Each perf counter is defined as "struct kvm_pmc";
* - There are two types of perf counters: general purpose (gp) and fixed.
@@ -144,6 +147,10 @@ void reprogram_gp_counter(struct kvm_pmc *pmc,
u64 eventsel)
{
unsigned config, type = PERF_TYPE_RAW;
u8 event_select, unit_mask;
+ struct kvm_arch *arch = &pmc->vcpu->kvm->arch;
+ struct kvm_pmu_event_filter *filter;
+ int i;
+ bool allow_event = true;

if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
printk_once("kvm pmu: pin control bit is ignored\n");
@@ -155,6 +162,24 @@ void reprogram_gp_counter(struct kvm_pmc *pmc,
u64 eventsel)
if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
return;

+ rcu_read_lock();
+ filter = rcu_dereference(arch->pmu_event_filter);
+ if (filter) {
+ for (i = 0; i < filter->nevents; i++)
+ if (filter->events[i] ==
+ (eventsel & AMD64_RAW_EVENT_MASK_NB))
+ break;
+ if (filter->type == KVM_PMU_EVENT_WHITELIST &&
+ i == filter->nevents)
+ allow_event = false;
+ if (filter->type == KVM_PMU_EVENT_BLACKLIST &&
+ i < filter->nevents)
+ allow_event = false;
+ }
+ rcu_read_unlock();
+ if (!allow_event)
+ return;
+
event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;

@@ -351,3 +376,39 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
{
kvm_pmu_reset(vcpu);
}
+
+int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
+{
+ struct kvm_pmu_event_filter tmp, *filter;
+ size_t size;
+ int r;
+
+ if (copy_from_user(&tmp, argp, sizeof(tmp)))
+ return -EFAULT;
+
+ if (tmp.nevents > KVM_PMU_EVENT_FILTER_MAX_EVENTS)
+ return -E2BIG;
+
+ size = sizeof(tmp) + sizeof(tmp.events[0]) * tmp.nevents;
+ filter = vmalloc(size);
+ if (!filter)
+ return -ENOMEM;
+
+ r = -EFAULT;
+ if (copy_from_user(filter, argp, size))
+ goto cleanup;
+
+ /* Ensure nevents can't be changed between the user copies. */
+ *filter = tmp;
+
+ mutex_lock(&kvm->lock);
+ rcu_swap_protected(kvm->arch.pmu_event_filter, filter,
+ mutex_is_locked(&kvm->lock));
+ mutex_unlock(&kvm->lock);
+
+ synchronize_rcu();
+ r = 0;
+cleanup:
+ kvfree(filter);
+ return r;
+}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 22dff661145a..58265f761c3b 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -118,6 +118,7 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
void kvm_pmu_reset(struct kvm_vcpu *vcpu);
void kvm_pmu_init(struct kvm_vcpu *vcpu);
void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
+int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);

bool is_vmware_backdoor_pmc(u32 pmc_idx);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2e302e977dac..9ddfc7193bc6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3135,6 +3135,7 @@ int kvm_vm_ioctl_check_extension(struct kvm
*kvm, long ext)
case KVM_CAP_GET_MSR_FEATURES:
case KVM_CAP_MSR_PLATFORM_INFO:
case KVM_CAP_EXCEPTION_PAYLOAD:
+ case KVM_CAP_PMU_EVENT_FILTER:
r = 1;
break;
case KVM_CAP_SYNC_REGS:
@@ -4978,6 +4979,11 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_hv_eventfd(kvm, &hvevfd);
break;
}
+ case KVM_SET_PMU_EVENT_FILTER: {
+ r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
+ break;
+
+ }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c2152f3dd02d..b18ff80e356a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -995,6 +995,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_SVE 170
#define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_PMU_EVENT_FILTER 173

#ifdef KVM_CAP_IRQ_ROUTING

@@ -1329,6 +1330,8 @@ struct kvm_s390_ucas_mapping {
#define KVM_PPC_GET_RMMU_INFO _IOW(KVMIO, 0xb0, struct kvm_ppc_rmmu_info)
/* Available with KVM_CAP_PPC_GET_CPU_CHAR */
#define KVM_PPC_GET_CPU_CHAR _IOR(KVMIO, 0xb1, struct kvm_ppc_cpu_char)
+/* Availabile with KVM_CAP_PMU_EVENT_FILTER */
+#define KVM_SET_PMU_EVENT_FILTER _IOW(KVMIO, 0xb2, struct
kvm_pmu_event_filter)

/* ioctl for vm fd */
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)


2019-07-11 11:24:35

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 11/07/19 03:25, Eric Hankland wrote:
>
> +/* for KVM_CAP_PMU_EVENT_FILTER */
> +struct kvm_pmu_event_filter {
> + __u32 type;
> + __u32 nevents;
> + __u64 events[0];
> +};
> +
> +#define KVM_PMU_EVENT_WHITELIST 0
> +#define KVM_PMU_EVENT_BLACKLIST 1
> +

"type" is a bit vague, so I am thinking of replacing it with "action"
and rename the constants to KVM_PMU_EVENT_ACCEPT/REJECT. What do you think?

Paolo

2019-07-11 12:00:37

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 11/07/19 03:25, Eric Hankland wrote:
> - Add a VM ioctl that can control which events the guest can monitor.

... and finally:

- the patch whitespace is damaged

- the filter is leaked when the VM is destroyed

- kmalloc(GFP_KERNEL_ACCOUNT) is preferrable to vmalloc because it
accounts memory to the VM correctly.

Since this is your first submission, I have fixed up everything.

Paolo

2019-07-11 17:06:48

by Eric Hankland

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

Thanks for your help. The "type"->"action" change and constant
renaming sound good to me.

On Thu, Jul 11, 2019 at 4:58 AM Paolo Bonzini <[email protected]> wrote:
>
> On 11/07/19 03:25, Eric Hankland wrote:
> > - Add a VM ioctl that can control which events the guest can monitor.
>
> ... and finally:
>
> - the patch whitespace is damaged
>
> - the filter is leaked when the VM is destroyed
>
> - kmalloc(GFP_KERNEL_ACCOUNT) is preferrable to vmalloc because it
> accounts memory to the VM correctly.
>
> Since this is your first submission, I have fixed up everything.
>
> Paolo

2019-07-11 17:13:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 11/07/19 19:04, Eric Hankland wrote:
> Thanks for your help. The "type"->"action" change and constant
> renaming sound good to me.

Good! Another thing, synchronize_rcu is a bit slow for something that
runs whenever a VM starts. KVM generally uses srcu instead (kvm->srcu
for things that change really rarely, kvm->irq_srcu for things that
change a bit more often).

Paolo

> On Thu, Jul 11, 2019 at 4:58 AM Paolo Bonzini <[email protected]> wrote:
>>
>> On 11/07/19 03:25, Eric Hankland wrote:
>>> - Add a VM ioctl that can control which events the guest can monitor.
>>
>> ... and finally:
>>
>> - the patch whitespace is damaged
>>
>> - the filter is leaked when the VM is destroyed
>>
>> - kmalloc(GFP_KERNEL_ACCOUNT) is preferrable to vmalloc because it
>> accounts memory to the VM correctly.
>>
>> Since this is your first submission, I have fixed up everything.
>>
>> Paolo

2019-07-12 03:23:28

by Wang, Wei W

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 07/11/2019 09:25 AM, Eric Hankland wrote:
> - Add a VM ioctl that can control which events the guest can monitor.
>
> Signed-off-by: ehankland<[email protected]>
> ---
> Changes since v1:
> -Moved to a vm ioctl rather than a vcpu one
> -Changed from a whitelist to a configurable filter which can either be
> white or black
> -Only restrict GP counters since fixed counters require extra handling
> and they can be disabled by setting the guest cpuid (though only by
> setting the number - they can't be disabled individually)

I think just disabling guest cpuid might not be enough, since guest
could write to the msr without checking the cpuid.

Why not just add a bitmap for fixed counter?
e.g. fixed_counter_reject_bitmap

At the beginning of reprogram_fixed_counter, we could add the check:

if (test_bit(idx, &kvm->arch.fixed_counter_reject_bitmap))
return -EACCES;

(Please test with your old guest and see if they have issues if we
inject #GP when
they try to set the fixed_ctrl msr. If there is, we could drop -EACCESS
above)

The bitmap could be set at kvm_vm_ioctl_set_pmu_event_filter.

> +/* for KVM_CAP_PMU_EVENT_FILTER */
> +struct kvm_pmu_event_filter {
> + __u32 type;
> + __u32 nevents;
> + __u64 events[0];
> +};
> +
> +#define KVM_PMU_EVENT_WHITELIST 0
> +#define KVM_PMU_EVENT_BLACKLIST 1

I think it would be better to add more, please see below:

enum kvm_pmu_action_type {
KVM_PMU_EVENT_ACTION_NONE = 0,
KVM_PMU_EVENT_ACTION_ACCEPT = 1,
KVM_PMU_EVENT_ACTION_REJECT = 2,
KVM_PMU_EVENT_ACTION_MAX
};

and do a check in kvm_vm_ioctl_set_pmu_event_filter()
if (filter->action >= KVM_PMU_EVENT_ACTION_MAX)
return -EINVAL;

This is for detecting the case that we add a new action in
userspace, while the kvm hasn't been updated to support that.

KVM_PMU_EVENT_ACTION_NONE is for userspace to remove
the filter after they set it.


> +
> #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index dd745b58ffd8..d674b79ff8da 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -22,6 +22,9 @@
> #include "lapic.h"
> #include "pmu.h"
>
> +/* This keeps the total size of the filter under 4k. */
> +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 63
> +

Why is this limit needed?

> /* NOTE:
> * - Each perf counter is defined as "struct kvm_pmc";
> * - There are two types of perf counters: general purpose (gp) and fixed.
> @@ -144,6 +147,10 @@ void reprogram_gp_counter(struct kvm_pmc *pmc,
> u64 eventsel)
> {
> unsigned config, type = PERF_TYPE_RAW;
> u8 event_select, unit_mask;
> + struct kvm_arch *arch = &pmc->vcpu->kvm->arch;
> + struct kvm_pmu_event_filter *filter;
> + int i;
> + bool allow_event = true;
>
> if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
> printk_once("kvm pmu: pin control bit is ignored\n");
> @@ -155,6 +162,24 @@ void reprogram_gp_counter(struct kvm_pmc *pmc,
> u64 eventsel)
> if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
> return;
>
> + rcu_read_lock();
> + filter = rcu_dereference(arch->pmu_event_filter);
> + if (filter) {
> + for (i = 0; i < filter->nevents; i++)
> + if (filter->events[i] ==
> + (eventsel & AMD64_RAW_EVENT_MASK_NB))
> + break;
> + if (filter->type == KVM_PMU_EVENT_WHITELIST &&
> + i == filter->nevents)
> + allow_event = false;
> + if (filter->type == KVM_PMU_EVENT_BLACKLIST &&
> + i < filter->nevents)
> + allow_event = false;
> + }
> + rcu_read_unlock();
> + if (!allow_event)
> + return;
> +

I think it looks tidier to wrap the changes above into a function:

if (kvm_pmu_filter_event(kvm, eventsel & AMD64_RAW_EVENT_MASK_NB))
return;

> event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
> unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>
> @@ -351,3 +376,39 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
> {
> kvm_pmu_reset(vcpu);
> }
> +
> +int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
> +{
> + struct kvm_pmu_event_filter tmp, *filter;
> + size_t size;
> + int r;
> +
> + if (copy_from_user(&tmp, argp, sizeof(tmp)))
> + return -EFAULT;
> +
> + if (tmp.nevents > KVM_PMU_EVENT_FILTER_MAX_EVENTS)
> + return -E2BIG;
> +
> + size = sizeof(tmp) + sizeof(tmp.events[0]) * tmp.nevents;
> + filter = vmalloc(size);
> + if (!filter)
> + return -ENOMEM;
> +
> + r = -EFAULT;
> + if (copy_from_user(filter, argp, size))

Though the above functions correctly, I would just move "r = -EFAULT" here
to have it executed conditionally.


> + goto cleanup;
> +
> + /* Ensure nevents can't be changed between the user copies. */
> + *filter = tmp;
> +
> + mutex_lock(&kvm->lock);
> + rcu_swap_protected(kvm->arch.pmu_event_filter, filter,
> + mutex_is_locked(&kvm->lock));
> + mutex_unlock(&kvm->lock);
> +
> + synchronize_rcu();
> + r = 0;
> +cleanup:
> + kvfree(filter);

Probably better to have it conditionally?

if (filter) {
synchronize_srcu();
kfree(filter)
}


You may want to factor it out, so that kvm_pmu_destroy could reuse.

Best,
Wei

2019-07-16 00:11:23

by Eric Hankland

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

> I think just disabling guest cpuid might not be enough, since guest
> could write to the msr without checking the cpuid.
>
> Why not just add a bitmap for fixed counter?
> e.g. fixed_counter_reject_bitmap
>
> At the beginning of reprogram_fixed_counter, we could add the check:
>
> if (test_bit(idx, &kvm->arch.fixed_counter_reject_bitmap))
> return -EACCES;
>
> (Please test with your old guest and see if they have issues if we
> inject #GP when
> they try to set the fixed_ctrl msr. If there is, we could drop -EACCESS
> above)
>
> The bitmap could be set at kvm_vm_ioctl_set_pmu_event_filter.

intel_pmu_refresh() checks the guest cpuid and sets the number of
fixed counters according to that:
pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
INTEL_PMC_MAX_FIXED);

and reprogram_fixed_counters()/get_fixed_pmc() respect this so the
guest can't just ignore the cpuid.

Adding a bitmap does let you do things like disable the first counter
but keep the second and third, but given that there are only three and
the events are likely to be on a whitelist anyway, it seemed like
adding the bitmap wasn't worth it. If you still feel the same way even
though we can disable them via the cpuid, I can add this in.

> I think it would be better to add more, please see below:
>
> enum kvm_pmu_action_type {
> KVM_PMU_EVENT_ACTION_NONE = 0,
> KVM_PMU_EVENT_ACTION_ACCEPT = 1,
> KVM_PMU_EVENT_ACTION_REJECT = 2,
> KVM_PMU_EVENT_ACTION_MAX
> };
>
> and do a check in kvm_vm_ioctl_set_pmu_event_filter()
> if (filter->action >= KVM_PMU_EVENT_ACTION_MAX)
> return -EINVAL;
>
> This is for detecting the case that we add a new action in
> userspace, while the kvm hasn't been updated to support that.
>
> KVM_PMU_EVENT_ACTION_NONE is for userspace to remove
> the filter after they set it.

We can achieve the same result by using a reject action with an empty
set of events - is there some advantage to "none" over that? I can add
that check for valid actions.

> > +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 63
>
> Why is this limit needed?

Serves to keep the filters on the smaller side and ensures the size
calculation can't overflow if users attempt to. Keeping the filter
under 4k is nicer for allocation - also, if we want really large
filters we might want to do something smarter than a linear traversal
of the filter when guests program counters.

> I think it looks tidier to wrap the changes above into a function:
>
> if (kvm_pmu_filter_event(kvm, eventsel & AMD64_RAW_EVENT_MASK_NB))
> return;

Okay - I can do that.

> > + kvfree(filter);
>
> Probably better to have it conditionally?
>
> if (filter) {
> synchronize_srcu();
> kfree(filter)
> }
>
> You may want to factor it out, so that kvm_pmu_destroy could reuse.

Do you mean kvm_arch_destroy_vm? It looks like that's where kvm_arch
members are freed. I can do that.

Eric

2019-07-16 08:44:59

by Wang, Wei W

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 07/16/2019 08:10 AM, Eric Hankland wrote:
>> I think just disabling guest cpuid might not be enough, since guest
>> could write to the msr without checking the cpuid.
>>
>> Why not just add a bitmap for fixed counter?
>> e.g. fixed_counter_reject_bitmap
>>
>> At the beginning of reprogram_fixed_counter, we could add the check:
>>
>> if (test_bit(idx, &kvm->arch.fixed_counter_reject_bitmap))
>> return -EACCES;
>>
>> (Please test with your old guest and see if they have issues if we
>> inject #GP when
>> they try to set the fixed_ctrl msr. If there is, we could drop -EACCESS
>> above)
>>
>> The bitmap could be set at kvm_vm_ioctl_set_pmu_event_filter.
> intel_pmu_refresh() checks the guest cpuid and sets the number of
> fixed counters according to that:
> pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
> INTEL_PMC_MAX_FIXED);
>
> and reprogram_fixed_counters()/get_fixed_pmc() respect this so the
> guest can't just ignore the cpuid.

Yes, but as you noticed, we couldn't disable fixed counter 2 while keeping
counter 3 running via that.


> Adding a bitmap does let you do things like disable the first counter
> but keep the second and third, but given that there are only three and
> the events are likely to be on a whitelist anyway, it seemed like
> adding the bitmap wasn't worth it. If you still feel the same way even
> though we can disable them via the cpuid, I can add this in.

We need the design to be architecturally clean. For example, if the
hardware later
comes up with fixed counter 5 and 6.
5 is something we really want to expose to the guest to use while 6 isn't,
can our design here (further) support that?

We don't want to re-design this at that time. However, extending what we
have would
be acceptable. So, if you hesitate to add the bitmap method that I
described, please
add GP tags to the ACTIONs defined, e.g.

enum kvm_pmu_action_type
{
KVM_PMU_EVENT_ACTION_GP_NONE = 0,
KVM_PMU_EVENT_ACTION_GP_ACCEPT = 1,
KVM_PMU_EVENT_ACTION_GP_REJECT = 2,
KVM_PMU_EVENT_ACTION_MAX
};

and add comments to explain something like below:

Those GP actions are for the filtering of guest events running on the
virtual general
purpose counters. The actions to filter guest events running on the
virtual fixed
function counters are not added currently as they all seem fine to be
used by the
guest so far, but that could be supported on demand in the future via
adding new
actions.


>> I think it would be better to add more, please see below:
>>
>> enum kvm_pmu_action_type {
>> KVM_PMU_EVENT_ACTION_NONE = 0,
>> KVM_PMU_EVENT_ACTION_ACCEPT = 1,
>> KVM_PMU_EVENT_ACTION_REJECT = 2,
>> KVM_PMU_EVENT_ACTION_MAX
>> };
>>
>> and do a check in kvm_vm_ioctl_set_pmu_event_filter()
>> if (filter->action >= KVM_PMU_EVENT_ACTION_MAX)
>> return -EINVAL;
>>
>> This is for detecting the case that we add a new action in
>> userspace, while the kvm hasn't been updated to support that.
>>
>> KVM_PMU_EVENT_ACTION_NONE is for userspace to remove
>> the filter after they set it.
> We can achieve the same result by using a reject action with an empty
> set of events - is there some advantage to "none" over that? I can add
> that check for valid actions.

Yes, we could also make it work via passing nevents=0.

I slightly prefer the use of the NONE action here. The advantage is
simpler (less) userspace command. For QEMU, people could
disable the filter list via qmp command, e.g. pmu-filter=none,
instead of pmu-filter=accept,nevents=0.
It also seems more straightforward when people read the usage
manual - a NONE command to cancel the filtering, instead of
"first setting this, then setting that.."
I don't see advantages of using nevents over NONE action.

Anyway, this one isn't a critical issue, and it's up to you here.


>
>>> +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 63
>> Why is this limit needed?
> Serves to keep the filters on the smaller side and ensures the size
> calculation can't overflow if users attempt to. Keeping the filter
> under 4k is nicer for allocation - also, if we want really large
> filters we might want to do something smarter than a linear traversal
> of the filter when guests program counters.

I think 63 is too small, and it looks like a random number being put here.
From the SDM table 19-3, it seems there are roughly 300 events.
Functionally,
the design should support to filter most of them.
(optimization with smarter traversal is another story that could be done
later)

Maybe
#define KVM_PMU_EVENT_FILTER_MAX_EVENTS (PAGE_SIZE - sizeof(struct
kvm_pmu_event_filter)) / sizeof (__u64)
?

and please add some comments above about the consideration that we set
this number.


>>> + kvfree(filter);
>> Probably better to have it conditionally?
>>
>> if (filter) {
>> synchronize_srcu();
>> kfree(filter)
>> }
>>
>> You may want to factor it out, so that kvm_pmu_destroy could reuse.
> Do you mean kvm_arch_destroy_vm? It looks like that's where kvm_arch
> members are freed. I can do that.

Sounds good.

Best,
Wei


2019-07-17 10:23:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 16/07/19 10:49, Wei Wang wrote:
> {
>   KVM_PMU_EVENT_ACTION_GP_NONE = 0,
>   KVM_PMU_EVENT_ACTION_GP_ACCEPT = 1,
>   KVM_PMU_EVENT_ACTION_GP_REJECT = 2,
>   KVM_PMU_EVENT_ACTION_MAX
> };
>
> and add comments to explain something like below:
>
> Those GP actions are for the filtering of guest events running on the
> virtual general
> purpose counters. The actions to filter guest events running on the
> virtual fixed
> function counters are not added currently as they all seem fine to be
> used by the
> guest so far, but that could be supported on demand in the future via
> adding new
> actions.
>

Let's just implement the bitmap of fixed counters (it's okay to follow
the same action as gp counters), and add it to struct
kvm_pmu_event_filter. While at it, we can add a bunch of padding u32s
and a flags field that can come in handy later (it would fail the ioctl
if nonzero).

Wei, Eric, who's going to do it? :)

Paolo

2019-07-17 17:08:43

by Eric Hankland

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

> Let's just implement the bitmap of fixed counters (it's okay to follow
> the same action as gp counters), and add it to struct
> kvm_pmu_event_filter. While at it, we can add a bunch of padding u32s
> and a flags field that can come in handy later (it would fail the ioctl
> if nonzero).
>
> Wei, Eric, who's going to do it? :)

I'm happy to do it - I'll send out a v3.

Eric

2019-07-17 17:31:38

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: PMU Event Filter

On 17/07/19 19:05, Eric Hankland wrote:
>> Let's just implement the bitmap of fixed counters (it's okay to follow
>> the same action as gp counters), and add it to struct
>> kvm_pmu_event_filter. While at it, we can add a bunch of padding u32s
>> and a flags field that can come in handy later (it would fail the ioctl
>> if nonzero).
>>
>> Wei, Eric, who's going to do it? :)
>
> I'm happy to do it - I'll send out a v3.

Please send a patch on top of what is currently in Linus's tree.

Thanks!

Paolo