2019-09-30 23:28:39

by Like Xu

[permalink] [raw]
Subject: [PATCH 0/3] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event

Hi Paolo & Community:

Performance Monitoring Unit is designed to monitor micro architectural
events which helps in analyzing how an application or operating systems
are performing on the processors. In KVM/X86, version 2 Architectural
PMU on Intel and AMD hosts have been enabled.

This patch series is going to improve vPMU Efficiency for guest perf users
which is mainly measured by guest NMI handler latency for basic perf usage
[1][2][3][4] with hardware PMU. It's not a passthrough solution but based
on the legacy vPMU implementation (since 2011) with backport-friendliness.

The general idea (defined in patch 2/3) is to reuse last created perf_event
for the same vPMC when the new requested config is the exactly same as the
last programed config (used by pmc_reprogram_counter()) AND the new event
period is appropriate and accepted (via perf_event_period() in patch 1/3).
Before the perf_event is resued, it would be disabled until it's could be
reused and reassigned a hw-counter again to serve for vPMC.

If the disabled perf_event is no longer reused, we do a lazy release
mechanism (defined in patch 3/3) which in a short is to release the
disabled perf_events on the first call of vcpu_enter_guest since the
vcpu gets next scheduled in if its MSRs is not accessed in the last
sched time slice. The bitmap pmu->lazy_release_ctrl is added to track.
The kvm_pmu_cleanup() is added to the first time to run vcpu_enter_guest
after the vcpu shced_in and the overhead is very limited.

With this optimization, the average latency of the guest NMI handler is
reduced from 99450 ns to 56195 ns (1.76x speed up on CLX-AP with v5.3).
If host disables the watchdog (echo 0 > /proc/sys/kernel/watchdog), the
minimum latency of guest NMI handler could be speed up at 2994x and in
the average at 685x. The run time of workload with perf attached inside
the guest could be reduced significantly with this optimization.

Please check each commit for more details and share your comments with us.

Thanks,
Like Xu

---
[1] multiplexing sampling usage: time perf record -e \
`perf list | grep Hardware | grep event |\
awk '{print $1}' | head -n 10 |tr '\n' ',' | sed 's/,$//' ` ./ftest
[2] one gp counter sampling usage: perf record -e branch-misses ./ftest
[3] one fixed counter sampling usage: perf record -e instructions ./ftest
[4] event count usage: perf stat -e branch-misses ./ftest

Like Xu (3):
perf/core: Provide a kernel-internal interface to recalibrate event
period
KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
pmc_reprogram_counter
KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

arch/x86/include/asm/kvm_host.h | 10 ++++
arch/x86/kvm/pmu.c | 88 ++++++++++++++++++++++++++++++++-
arch/x86/kvm/pmu.h | 15 +++++-
arch/x86/kvm/pmu_amd.c | 14 ++++++
arch/x86/kvm/vmx/pmu_intel.c | 27 ++++++++++
arch/x86/kvm/x86.c | 6 +++
include/linux/perf_event.h | 5 ++
kernel/events/core.c | 28 ++++++++---
8 files changed, 182 insertions(+), 11 deletions(-)

--
2.21.0


2019-09-30 23:28:48

by Like Xu

[permalink] [raw]
Subject: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

Currently, perf_event_period() is used by user tool via ioctl. Exporting
perf_event_period() for kernel users (such as KVM) who may recalibrate the
event period for their assigned counters according to their requirements.

The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().

Suggested-by: Kan Liang <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 28 +++++++++++++++++++++-------
2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61448c19a132..83db24173e4d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1336,6 +1336,7 @@ extern void perf_event_disable_local(struct perf_event *event);
extern void perf_event_disable_inatomic(struct perf_event *event);
extern void perf_event_task_tick(void);
extern int perf_event_account_interrupt(struct perf_event *event);
+extern int perf_event_period(struct perf_event *event, u64 value);
#else /* !CONFIG_PERF_EVENTS: */
static inline void *
perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1415,6 +1416,10 @@ static inline void perf_event_disable(struct perf_event *event) { }
static inline int __perf_event_disable(void *info) { return -1; }
static inline void perf_event_task_tick(void) { }
static inline int perf_event_release_kernel(struct perf_event *event) { return 0; }
+extern int perf_event_period(struct perf_event *event, u64 value)
+{
+ return -EINVAL;
+}
#endif

#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4655adbbae10..de740d20b028 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5094,16 +5094,11 @@ static int perf_event_check_period(struct perf_event *event, u64 value)
return event->pmu->check_period(event, value);
}

-static int perf_event_period(struct perf_event *event, u64 __user *arg)
+static int _perf_event_period(struct perf_event *event, u64 value)
{
- u64 value;
-
if (!is_sampling_event(event))
return -EINVAL;

- if (copy_from_user(&value, arg, sizeof(value)))
- return -EFAULT;
-
if (!value)
return -EINVAL;

@@ -5121,6 +5116,19 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
return 0;
}

+int perf_event_period(struct perf_event *event, u64 value)
+{
+ struct perf_event_context *ctx;
+ int ret;
+
+ ctx = perf_event_ctx_lock(event);
+ ret = _perf_event_period(event, value);
+ perf_event_ctx_unlock(event, ctx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(perf_event_period);
+
static const struct file_operations perf_fops;

static inline int perf_fget_light(int fd, struct fd *p)
@@ -5164,8 +5172,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
return _perf_event_refresh(event, arg);

case PERF_EVENT_IOC_PERIOD:
- return perf_event_period(event, (u64 __user *)arg);
+ {
+ u64 value;
+
+ if (get_user(value, (u64 __user *)&arg))
+ return -EFAULT;

+ return _perf_event_period(event, value);
+ }
case PERF_EVENT_IOC_ID:
{
u64 id = primary_event_id(event);
--
2.21.0

2019-09-30 23:29:49

by Like Xu

[permalink] [raw]
Subject: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't access (set_msr/get_msr/rdpmc) any of the vPMC's MSRs
during an entire vcpu sched time slice, and its independent enable bit of
the vPMC isn't set, we can predict that the guest has finished the use of
this vPMC, and then it's time to release the non-reused perf_event on the
first call of vcpu_enter_guest() since the vcpu gets next scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't accessed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei W Wang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++
arch/x86/kvm/pmu.c | 43 +++++++++++++++++++++++++++++++++
arch/x86/kvm/pmu.h | 3 +++
arch/x86/kvm/pmu_amd.c | 13 ++++++++++
arch/x86/kvm/vmx/pmu_intel.c | 25 +++++++++++++++++++
arch/x86/kvm/x86.c | 6 +++++
6 files changed, 98 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 15f2ebad94f9..6723c04c8dc6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -479,6 +479,14 @@ struct kvm_pmu {
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
struct irq_work irq_work;
u64 reprogram_pmi;
+
+ /* for PMC being set, do not released its perf_event (if any) */
+ u64 lazy_release_ctrl;
+
+ union {
+ u8 event_count :7; /* the total number of created perf_events */
+ bool enable_cleanup :1;
+ } state;
};

struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 74bc5c42b8b5..1b3cec38b1a1 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -137,6 +137,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
}

pmc->perf_event = event;
+ pmc_to_pmu(pmc)->state.event_count++;
clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
}

@@ -368,6 +369,7 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
if (!pmc)
return 1;

+ __set_bit(pmc->idx, (unsigned long *)&pmu->lazy_release_ctrl);
*data = pmc_read_counter(pmc) & mask;
return 0;
}
@@ -385,11 +387,13 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)

int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
{
+ kvm_x86_ops->pmu_ops->update_lazy_release_ctrl(vcpu, msr);
return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data);
}

int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
+ kvm_x86_ops->pmu_ops->update_lazy_release_ctrl(vcpu, msr_info->index);
return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
}

@@ -417,9 +421,48 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
memset(pmu, 0, sizeof(*pmu));
kvm_x86_ops->pmu_ops->init(vcpu);
init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
+ pmu->lazy_release_ctrl = 0;
+ pmu->state.event_count = 0;
+ pmu->state.enable_cleanup = false;
kvm_pmu_refresh(vcpu);
}

+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
+{
+ struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+ if (pmc_is_fixed(pmc))
+ return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+ pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
+
+ return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc = NULL;
+ u64 bitmask = ~pmu->lazy_release_ctrl;
+ int i;
+
+ if (!unlikely(pmu->state.enable_cleanup))
+ return;
+
+ /* do cleanup before the first time of running vcpu after sched_in */
+ pmu->state.enable_cleanup = false;
+
+ /* cleanup unmarked vPMC in the last sched time slice */
+ for_each_set_bit(i, (unsigned long *)&bitmask, X86_PMC_IDX_MAX) {
+ pmc = kvm_x86_ops->pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+ if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
+ pmc_stop_counter(pmc);
+ }
+
+ /* reset vPMC lazy-release states for this sched time slice */
+ pmu->lazy_release_ctrl = 0;
+}
+
void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
{
kvm_pmu_reset(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 3a95952702d2..c681738ba59c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -34,6 +34,7 @@ struct kvm_pmu_ops {
void (*refresh)(struct kvm_vcpu *vcpu);
void (*init)(struct kvm_vcpu *vcpu);
void (*reset)(struct kvm_vcpu *vcpu);
+ void (*update_lazy_release_ctrl)(struct kvm_vcpu *vcpu, u32 msr);
};

static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
@@ -61,6 +62,7 @@ static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
perf_event_release_kernel(pmc->perf_event);
pmc->perf_event = NULL;
pmc->programed_config = 0;
+ pmc_to_pmu(pmc)->state.event_count--;
}
}

@@ -125,6 +127,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
void kvm_pmu_reset(struct kvm_vcpu *vcpu);
void kvm_pmu_init(struct kvm_vcpu *vcpu);
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 3d656b2d439f..c74087dad5e8 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -208,6 +208,18 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
return ret;
}

+static void amd_update_lazy_release_ctrl(struct kvm_vcpu *vcpu, u32 msr)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc = NULL;
+
+ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
+ pmc = pmc ? pmc : get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
+
+ if (pmc)
+ __set_bit(pmc->idx, (unsigned long *)&pmu->lazy_release_ctrl);
+}
+
static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -315,4 +327,5 @@ struct kvm_pmu_ops amd_pmu_ops = {
.refresh = amd_pmu_refresh,
.init = amd_pmu_init,
.reset = amd_pmu_reset,
+ .update_lazy_release_ctrl = amd_update_lazy_release_ctrl,
};
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 73bbefa1d54e..4aa7d2eea5c8 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -140,6 +140,30 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
return &counters[idx];
}

+static void intel_update_lazy_release_ctrl(struct kvm_vcpu *vcpu, u32 msr)
+{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc = NULL;
+ int i;
+
+ if (msr == MSR_CORE_PERF_FIXED_CTR_CTRL) {
+ for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+ if (!fixed_ctrl_field(pmu->fixed_ctr_ctrl, i))
+ continue;
+ __set_bit(INTEL_PMC_IDX_FIXED + i,
+ (unsigned long *)&pmu->lazy_release_ctrl);
+ }
+ return;
+ }
+
+ pmc = get_fixed_pmc(pmu, msr);
+ pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0);
+ pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0);
+
+ if (pmc)
+ __set_bit(pmc->idx, (unsigned long *)&pmu->lazy_release_ctrl);
+}
+
static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -373,4 +397,5 @@ struct kvm_pmu_ops intel_pmu_ops = {
.refresh = intel_pmu_refresh,
.init = intel_pmu_init,
.reset = intel_pmu_reset,
+ .update_lazy_release_ctrl = intel_update_lazy_release_ctrl,
};
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ed07d8d2caa..945b8be53a90 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8076,6 +8076,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
goto cancel_injection;
}

+ kvm_pmu_cleanup(vcpu);
+
preempt_disable();

kvm_x86_ops->prepare_guest_switch(vcpu);
@@ -9415,7 +9417,11 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)

void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
{
+ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
vcpu->arch.l1tf_flush_l1d = true;
+ if (pmu->version && unlikely(pmu->state.event_count))
+ pmu->state.enable_cleanup = true;
kvm_x86_ops->sched_in(vcpu, cpu);
}

--
2.21.0

2019-09-30 23:31:30

by Like Xu

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a high-frequency and heavyweight operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler and limits the vPMU usage scenario.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its last programed config AND a new sample period
based on pmc->counter is accepted by host perf interface, the current event
could be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable event & reset count)
and pmc_resume_counter (recalibrate period & re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 programed_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~94% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 99450 ns to 56195 ns (1.76x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at 2994x (from 18134692 to 6057 ns) and in the average at 685x.

Suggested-by: Kan Liang <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/pmu.c | 45 +++++++++++++++++++++++++++++++--
arch/x86/kvm/pmu.h | 12 +++++++--
arch/x86/kvm/pmu_amd.c | 1 +
arch/x86/kvm/vmx/pmu_intel.c | 2 ++
5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 23edf56cf577..15f2ebad94f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -458,6 +458,8 @@ struct kvm_pmc {
u64 eventsel;
struct perf_event *perf_event;
struct kvm_vcpu *vcpu;
+ /* the exact requested config for perf_event reusability check */
+ u64 programed_config;
};

struct kvm_pmu {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 46875bbd0419..74bc5c42b8b5 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
}

+static void pmc_pause_counter(struct kvm_pmc *pmc)
+{
+ if (!pmc->perf_event)
+ return;
+
+ pmc->counter = pmc_read_counter(pmc);
+
+ perf_event_disable(pmc->perf_event);
+
+ /* reset count to avoid redundant accumulation */
+ local64_set(&pmc->perf_event->count, 0);
+}
+
+static bool pmc_resume_counter(struct kvm_pmc *pmc)
+{
+ if (!pmc->perf_event)
+ return false;
+
+ /* recalibrate sample period and check if it's accepted by perf core */
+ if (perf_event_period(pmc->perf_event,
+ (-pmc->counter) & pmc_bitmask(pmc)))
+ return false;
+
+ /* reuse perf_event to serve as pmc_reprogram_counter() does*/
+ perf_event_enable(pmc->perf_event);
+ clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
+ return true;
+}
+
void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
{
unsigned config, type = PERF_TYPE_RAW;
@@ -154,7 +183,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)

pmc->eventsel = eventsel;

- pmc_stop_counter(pmc);
+ pmc_pause_counter(pmc);

if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
return;
@@ -193,6 +222,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
if (type == PERF_TYPE_RAW)
config = eventsel & X86_RAW_EVENT_MASK;

+ if (pmc->programed_config == eventsel && pmc_resume_counter(pmc))
+ return;
+
+ pmc_release_perf_event(pmc);
+
+ pmc->programed_config = eventsel;
pmc_reprogram_counter(pmc, type, config,
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
@@ -209,7 +244,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
struct kvm_pmu_event_filter *filter;
struct kvm *kvm = pmc->vcpu->kvm;

- pmc_stop_counter(pmc);
+ pmc_pause_counter(pmc);

if (!en_field || !pmc_is_enabled(pmc))
return;
@@ -224,6 +259,12 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
return;
}

+ if (pmc->programed_config == (u64)ctrl && pmc_resume_counter(pmc))
+ return;
+
+ pmc_release_perf_event(pmc);
+
+ pmc->programed_config = (u64)ctrl;
pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
kvm_x86_ops->pmu_ops->find_fixed_event(idx),
!(en_field & 0x2), /* exclude user */
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 58265f761c3b..3a95952702d2 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -55,12 +55,20 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
return counter & pmc_bitmask(pmc);
}

-static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
{
if (pmc->perf_event) {
- pmc->counter = pmc_read_counter(pmc);
perf_event_release_kernel(pmc->perf_event);
pmc->perf_event = NULL;
+ pmc->programed_config = 0;
+ }
+}
+
+static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+{
+ if (pmc->perf_event) {
+ pmc->counter = pmc_read_counter(pmc);
+ pmc_release_perf_event(pmc);
}
}

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index c8388389a3b0..3d656b2d439f 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -285,6 +285,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
pmu->gp_counters[i].type = KVM_PMC_GP;
pmu->gp_counters[i].vcpu = vcpu;
pmu->gp_counters[i].idx = i;
+ pmu->gp_counters[i].programed_config = 0;
}
}

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 4dea0e0e7e39..73bbefa1d54e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -325,12 +325,14 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
pmu->gp_counters[i].type = KVM_PMC_GP;
pmu->gp_counters[i].vcpu = vcpu;
pmu->gp_counters[i].idx = i;
+ pmu->gp_counters[i].programed_config = 0;
}

for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) {
pmu->fixed_counters[i].type = KVM_PMC_FIXED;
pmu->fixed_counters[i].vcpu = vcpu;
pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
+ pmu->fixed_counters[i].programed_config = 0;
}
}

--
2.21.0

2019-10-01 02:28:33

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

Hi Like,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[cannot apply to v5.4-rc1 next-20190930]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Like-Xu/perf-core-Provide-a-kernel-internal-interface-to-recalibrate-event-period/20191001-081543
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: nds32-allnoconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 8.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=nds32

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

nds32le-linux-ld: init/do_mounts.o: in function `perf_event_period':
>> do_mounts.c:(.text+0xc): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: init/noinitramfs.o: in function `perf_event_period':
noinitramfs.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: arch/nds32/kernel/sys_nds32.o: in function `perf_event_period':
sys_nds32.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: arch/nds32/kernel/syscall_table.o: in function `perf_event_period':
syscall_table.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: arch/nds32/mm/fault.o: in function `perf_event_period':
fault.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/fork.o: in function `perf_event_period':
fork.c:(.text+0x374): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/exec_domain.o: in function `perf_event_period':
exec_domain.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/cpu.o: in function `perf_event_period':
cpu.c:(.text+0xd4): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/exit.o: in function `perf_event_period':
exit.c:(.text+0xf30): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sysctl.o: in function `perf_event_period':
sysctl.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sysctl_binary.o: in function `perf_event_period':
sysctl_binary.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/capability.o: in function `perf_event_period':
capability.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/ptrace.o: in function `perf_event_period':
ptrace.c:(.text+0x3f4): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/signal.o: in function `perf_event_period':
signal.c:(.text+0x580): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sys.o: in function `perf_event_period':
sys.c:(.text+0x70c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/umh.o: in function `perf_event_period':
umh.c:(.text+0x4c8): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/pid.o: in function `perf_event_period':
pid.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/nsproxy.o: in function `perf_event_period':
nsproxy.c:(.text+0x14c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/reboot.o: in function `perf_event_period':
reboot.c:(.text+0x6c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/core.o: in function `perf_event_period':
core.c:(.text+0x338): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/loadavg.o: in function `perf_event_period':
loadavg.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/clock.o: in function `perf_event_period':
clock.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/cputime.o: in function `perf_event_period':
cputime.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/idle.o: in function `perf_event_period':
idle.c:(.text+0x94): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/fair.o: in function `perf_event_period':
fair.c:(.text+0xe68): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/rt.o: in function `perf_event_period':
rt.c:(.text+0x9bc): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/deadline.o: in function `perf_event_period':
deadline.c:(.text+0x138c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/wait.o: in function `perf_event_period':
wait.c:(.text+0x16c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/wait_bit.o: in function `perf_event_period':
wait_bit.c:(.text+0x70): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/swait.o: in function `perf_event_period':
swait.c:(.text+0x28): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/sched/completion.o: in function `perf_event_period':
completion.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/power/qos.o: in function `perf_event_period':
qos.c:(.text+0xbc): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/printk/printk.o: in function `perf_event_period':
printk.c:(.text+0x1cc): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/time/time.o: in function `perf_event_period':
time.c:(.text+0x3c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/time/timer.o: in function `perf_event_period':
timer.c:(.text+0x3a4): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/time/hrtimer.o: in function `perf_event_period':
hrtimer.c:(.text+0x5c4): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/time/posix-stubs.o: in function `perf_event_period':
posix-stubs.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: kernel/time/tick-common.o: in function `perf_event_period':
tick-common.c:(.text+0x18c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/fadvise.o: in function `perf_event_period':
fadvise.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/page-writeback.o: in function `perf_event_period':
page-writeback.c:(.text+0x668): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/readahead.o: in function `perf_event_period':
readahead.c:(.text+0x198): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/debug.o: in function `perf_event_period':
debug.c:(.text+0x74): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/mincore.o: in function `perf_event_period':
mincore.c:(.text+0x184): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/mlock.o: in function `perf_event_period':
mlock.c:(.text+0x528): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/mmap.o: in function `perf_event_period':
mmap.c:(.text+0x63c): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/mprotect.o: in function `perf_event_period':
mprotect.c:(.text+0x2b8): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/mremap.o: in function `perf_event_period':
mremap.c:(.text+0xdc): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: mm/msync.o: in function `perf_event_period':
msync.c:(.text+0x0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: fs/open.o: in function `perf_event_period':
open.c:(.text+0x3f4): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: fs/read_write.o: in function `perf_event_period':
read_write.c:(.text+0x300): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here
nds32le-linux-ld: fs/stat.o: in function `perf_event_period':
stat.c:(.text+0x2f0): multiple definition of `perf_event_period'; init/main.o:main.c:(.text+0x0): first defined here

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (11.05 kB)
.config.gz (5.06 kB)
Download all attachments

2019-10-01 02:48:21

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

Hi Like,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/linux-next]
[cannot apply to v5.4-rc1 next-20190930]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Like-Xu/perf-core-Provide-a-kernel-internal-interface-to-recalibrate-event-period/20191001-081543
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 7.4.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=sh

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

kernel/events/core.o: In function `_perf_ioctl':
>> core.c:(.text+0x7068): undefined reference to `__get_user_unknown'

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.35 kB)
.config.gz (11.33 kB)
Download all attachments

2019-10-01 08:23:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter

On Mon, Sep 30, 2019 at 03:22:56PM +0800, Like Xu wrote:
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 46875bbd0419..74bc5c42b8b5 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
> }
>
> +static void pmc_pause_counter(struct kvm_pmc *pmc)
> +{
> + if (!pmc->perf_event)
> + return;
> +
> + pmc->counter = pmc_read_counter(pmc);
> +
> + perf_event_disable(pmc->perf_event);
> +
> + /* reset count to avoid redundant accumulation */
> + local64_set(&pmc->perf_event->count, 0);

Yuck, don't frob in data structures you don't own.

Just like you exported the IOC_PERIOD thing, so too is there a
IOC_RESET.

Furthermore; wth do you call pmc_read_counter() _before_ doing
perf_event_disable() ? Doing it the other way around is much cheaper,
even better, you can use perf_event_count() after disable.

> +}
> +
> +static bool pmc_resume_counter(struct kvm_pmc *pmc)
> +{
> + if (!pmc->perf_event)
> + return false;
> +
> + /* recalibrate sample period and check if it's accepted by perf core */
> + if (perf_event_period(pmc->perf_event,
> + (-pmc->counter) & pmc_bitmask(pmc)))
> + return false;

I'd do the reset here, but then you have 3 function in a row that do
perf_event_ctx_lock()+perf_event_ctx_unlock(), which is rather
expensive.

> +
> + /* reuse perf_event to serve as pmc_reprogram_counter() does*/
> + perf_event_enable(pmc->perf_event);
> + clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
> + return true;
> +}

2019-10-01 08:25:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On Mon, Sep 30, 2019 at 03:22:57PM +0800, Like Xu wrote:
> + union {
> + u8 event_count :7; /* the total number of created perf_events */
> + bool enable_cleanup :1;

That's atrocious, don't ever create a bitfield with base _Bool.

> + } state;

2019-10-01 12:20:50

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter

Hi Peter,

On 2019/10/1 16:22, Peter Zijlstra wrote:
> On Mon, Sep 30, 2019 at 03:22:56PM +0800, Like Xu wrote:
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 46875bbd0419..74bc5c42b8b5 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>> clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
>> }
>>
>> +static void pmc_pause_counter(struct kvm_pmc *pmc)
>> +{
>> + if (!pmc->perf_event)
>> + return;
>> +
>> + pmc->counter = pmc_read_counter(pmc);
>> +
>> + perf_event_disable(pmc->perf_event);
>> +
>> + /* reset count to avoid redundant accumulation */
>> + local64_set(&pmc->perf_event->count, 0);
>
> Yuck, don't frob in data structures you don't own.

Yes, it's reasonable. Thanks.

>
> Just like you exported the IOC_PERIOD thing, so too is there a
> IOC_RESET.
>
> Furthermore; wth do you call pmc_read_counter() _before_ doing
> perf_event_disable() ? Doing it the other way around is much cheaper,
> even better, you can use perf_event_count() after disable.

Yes, it's much better and let me apply this.

>
>> +}
>> +
>> +static bool pmc_resume_counter(struct kvm_pmc *pmc)
>> +{
>> + if (!pmc->perf_event)
>> + return false;
>> +
>> + /* recalibrate sample period and check if it's accepted by perf core */
>> + if (perf_event_period(pmc->perf_event,
>> + (-pmc->counter) & pmc_bitmask(pmc)))
>> + return false;
>
> I'd do the reset here, but then you have 3 function in a row that do
> perf_event_ctx_lock()+perf_event_ctx_unlock(), which is rather
> expensive.

Calling pmc_pause_counter() is not always followed by calling
pmc_resume_counter(). The former may be called multiple times before the
later is called, so if we do not reset event->count in the
pmc_pause_counter(), it will be repeatedly accumulated into pmc->counter
which is a functional error.

>
>> +
>> + /* reuse perf_event to serve as pmc_reprogram_counter() does*/
>> + perf_event_enable(pmc->perf_event);
>> + clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
>> + return true;
>> +}
>

2019-10-01 12:35:12

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

Hi Peter,

On 2019/10/1 16:23, Peter Zijlstra wrote:
> On Mon, Sep 30, 2019 at 03:22:57PM +0800, Like Xu wrote:
>> + union {
>> + u8 event_count :7; /* the total number of created perf_events */
>> + bool enable_cleanup :1;
>
> That's atrocious, don't ever create a bitfield with base _Bool.

I saw this kind of usages in the tree such as "struct
arm_smmu_master/tipc_mon_state/regmap_irq_chip".

I'm not sure is this your personal preference or is there a technical
reason such as this usage is not incompatible with union syntax?

My design point is to save a little bit space without introducing
two variables such as "int event_count & bool enable_cleanup".

One of the alternatives is to introduce "u8 pmu_state", where the last
seven bits are event_count for X86_PMC_IDX_MAX and the highest bit is
the enable_cleanup bit. Are you OK with this ?

By the way, is the lazy release mechanism looks reasonable to you?

>
>> + } state;
>

2019-10-07 12:06:00

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

On 30/09/19 09:22, Like Xu wrote:
> -static int perf_event_period(struct perf_event *event, u64 __user *arg)
> +static int _perf_event_period(struct perf_event *event, u64 value)

__perf_event_period or perf_event_period_locked would be more consistent
with other code in Linux.

Paolo

2019-10-07 13:25:57

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period



On 10/7/2019 8:01 AM, Paolo Bonzini wrote:
> On 30/09/19 09:22, Like Xu wrote:
>> -static int perf_event_period(struct perf_event *event, u64 __user *arg)
>> +static int _perf_event_period(struct perf_event *event, u64 value)
>
> __perf_event_period or perf_event_period_locked would be more consistent
> with other code in Linux.
>

But that will be not consistent with current perf code. For example,
_perf_event_enable(), _perf_event_disable(), _perf_event_reset() and
_perf_event_refresh().
Currently, The function name without '_' prefix is the one exported and
with lock. The function name with '_' prefix is the main body.

If we have to use the "_locked" or "__", I think we'd better change the
name for other functions as well.

Thanks,
Kan

2019-10-07 15:08:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf/core: Provide a kernel-internal interface to recalibrate event period

On 07/10/19 15:25, Liang, Kan wrote:
>
>> On 30/09/19 09:22, Like Xu wrote:
>>> -static int perf_event_period(struct perf_event *event, u64 __user *arg)
>>> +static int _perf_event_period(struct perf_event *event, u64 value)
>>
>> __perf_event_period or perf_event_period_locked would be more consistent
>> with other code in Linux.
>>
>
> But that will be not consistent with current perf code. For example,
> _perf_event_enable(), _perf_event_disable(), _perf_event_reset() and
> _perf_event_refresh().
> Currently, The function name without '_' prefix is the one exported and
> with lock. The function name with '_' prefix is the main body.
>
> If we have to use the "_locked" or "__", I think we'd better change the
> name for other functions as well.

Oh, sorry I missed that.

Paolo

2019-10-08 12:12:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On Tue, Oct 01, 2019 at 08:33:45PM +0800, Like Xu wrote:
> Hi Peter,
>
> On 2019/10/1 16:23, Peter Zijlstra wrote:
> > On Mon, Sep 30, 2019 at 03:22:57PM +0800, Like Xu wrote:
> > > + union {
> > > + u8 event_count :7; /* the total number of created perf_events */
> > > + bool enable_cleanup :1;
> >
> > That's atrocious, don't ever create a bitfield with base _Bool.
>
> I saw this kind of usages in the tree such as "struct
> arm_smmu_master/tipc_mon_state/regmap_irq_chip".

Because other people do tasteless things doesn't make it right.

> I'm not sure is this your personal preference or is there a technical
> reason such as this usage is not incompatible with union syntax?

Apparently it 'works', so there is no hard technical reason, but
consider that _Bool is specified as an integer type large enough to
store the values 0 and 1, then consider it as a base type for a
bitfield. That's just disguisting.

Now, I suppose it 'works', but there is no actual benefit over just
using a single bit of any other base type.

> My design point is to save a little bit space without introducing
> two variables such as "int event_count & bool enable_cleanup".

Your design is questionable, the structure is _huge_, and your union has
event_count:0 and enable_cleanup:0 as the same bit, which I don't think
was intentional.

Did you perhaps want to write:

struct {
u8 event_count : 7;
u8 event_cleanup : 1;
};

which has a total size of 1 byte and uses the low 7 bits as count and the
msb as cleanup.

Also, the structure has plenty holes to stick proper variables in
without further growing it.

> By the way, is the lazy release mechanism looks reasonable to you?

I've no idea how it works.. I don't know much about virt.

2019-10-09 03:15:53

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On 2019/10/8 20:11, Peter Zijlstra wrote:
> On Tue, Oct 01, 2019 at 08:33:45PM +0800, Like Xu wrote:
>> Hi Peter,
>>
>> On 2019/10/1 16:23, Peter Zijlstra wrote:
>>> On Mon, Sep 30, 2019 at 03:22:57PM +0800, Like Xu wrote:
>>>> + union {
>>>> + u8 event_count :7; /* the total number of created perf_events */
>>>> + bool enable_cleanup :1;
>>>
>>> That's atrocious, don't ever create a bitfield with base _Bool.
>>
>> I saw this kind of usages in the tree such as "struct
>> arm_smmu_master/tipc_mon_state/regmap_irq_chip".
>
> Because other people do tasteless things doesn't make it right.
>
>> I'm not sure is this your personal preference or is there a technical
>> reason such as this usage is not incompatible with union syntax?
>
> Apparently it 'works', so there is no hard technical reason, but
> consider that _Bool is specified as an integer type large enough to
> store the values 0 and 1, then consider it as a base type for a
> bitfield. That's just disguisting.

It's reasonable. Thanks.

>
> Now, I suppose it 'works', but there is no actual benefit over just
> using a single bit of any other base type.
>
>> My design point is to save a little bit space without introducing
>> two variables such as "int event_count & bool enable_cleanup".
>
> Your design is questionable, the structure is _huge_, and your union has
> event_count:0 and enable_cleanup:0 as the same bit, which I don't think
> was intentional.
>
> Did you perhaps want to write:
>
> struct {
> u8 event_count : 7;
> u8 event_cleanup : 1;
> };
>
> which has a total size of 1 byte and uses the low 7 bits as count and the
> msb as cleanup.

Yes, my union here is wrong and let me fix it in the next version.

>
> Also, the structure has plenty holes to stick proper variables in
> without further growing it.

Yes, we could benefit from it.

>
>> By the way, is the lazy release mechanism looks reasonable to you?
>
> I've no idea how it works.. I don't know much about virt.
>

2019-10-09 07:18:26

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On 09/10/19 05:14, Like Xu wrote:
>>
>>
>>> I'm not sure is this your personal preference or is there a technical
>>> reason such as this usage is not incompatible with union syntax?
>>
>> Apparently it 'works', so there is no hard technical reason, but
>> consider that _Bool is specified as an integer type large enough to
>> store the values 0 and 1, then consider it as a base type for a
>> bitfield. That's just disguisting.
>
> It's reasonable. Thanks.

/me chimes in since this is KVM code after all...

For stuff like hardware registers, bitfields are probably a bad idea
anyway, so let's only consider the case of space optimization.

bool:2 would definitely cause an eyebrow raise, but I don't see why
bool:1 bitfields are a problem. An integer type large enough to store
the values 0 and 1 can be of any size bigger than one bit.

bool bitfields preserve the magic behavior where something like this:

foo->x = y;

(x is a bool bitfield) would be compiled as

foo->x = (y != 0);

which can be a plus or a minus depending on the point of view. :)
Either way, bool bitfields are useful if you are using bitfields for
space optimization, especially if you have existing code using bool and
it might rely on the idiom above.

However, in this patch bitfields are unnecessary and they result in
worse code from the compiler. There is plenty of padding in struct
kvm_pmu, with or without bitfields, so I'd go with "u8 event_count; bool
enable_cleanup;" (or better "need_cleanup").

Thanks,

Paolo

2019-10-09 08:10:00

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

Hi Paolo,
On 2019/10/9 15:15, Paolo Bonzini wrote:
> On 09/10/19 05:14, Like Xu wrote:
>>>
>>>
>>>> I'm not sure is this your personal preference or is there a technical
>>>> reason such as this usage is not incompatible with union syntax?
>>>
>>> Apparently it 'works', so there is no hard technical reason, but
>>> consider that _Bool is specified as an integer type large enough to
>>> store the values 0 and 1, then consider it as a base type for a
>>> bitfield. That's just disguisting.
>>
>> It's reasonable. Thanks.
>
> /me chimes in since this is KVM code after all...
>
> For stuff like hardware registers, bitfields are probably a bad idea
> anyway, so let's only consider the case of space optimization.
>
> bool:2 would definitely cause an eyebrow raise, but I don't see why
> bool:1 bitfields are a problem. An integer type large enough to store
> the values 0 and 1 can be of any size bigger than one bit.
>
> bool bitfields preserve the magic behavior where something like this:
>
> foo->x = y;
>
> (x is a bool bitfield) would be compiled as
>
> foo->x = (y != 0);
>
> which can be a plus or a minus depending on the point of view. :)
> Either way, bool bitfields are useful if you are using bitfields for
> space optimization, especially if you have existing code using bool and
> it might rely on the idiom above.
>
> However, in this patch bitfields are unnecessary and they result in
> worse code from the compiler. There is plenty of padding in struct
> kvm_pmu, with or without bitfields, so I'd go with "u8 event_count; bool
> enable_cleanup;" (or better "need_cleanup").

Thanks. The "u8 event_count; bool need_cleanup;" looks good to me.

So is the lazy release mechanism looks reasonable to you ?
If so, I may release the next version based on current feedback.

>
> Thanks,
>
> Paolo
>

2019-10-09 08:17:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On Wed, Oct 09, 2019 at 09:15:03AM +0200, Paolo Bonzini wrote:
> For stuff like hardware registers, bitfields are probably a bad idea
> anyway, so let's only consider the case of space optimization.

Except for hardware registers? I actually like bitfields to describe
hardware registers.

> bool:2 would definitely cause an eyebrow raise, but I don't see why
> bool:1 bitfields are a problem. An integer type large enough to store
> the values 0 and 1 can be of any size bigger than one bit.

Consider:

bool foo:1;
bool bar:1;

Will bar use the second bit of _Bool? Does it have one? (yes it does,
but it's still weird).

But worse, as used in the parent thread:

u8 count:7;
bool flag:1;

Who says the @flag thing will even be the msb of the initial u8 and not
a whole new variable due to change in base type?

> bool bitfields preserve the magic behavior where something like this:
>
> foo->x = y;
>
> (x is a bool bitfield) would be compiled as
>
> foo->x = (y != 0);

This is confusion; if y is a single bit bitfield, then there is
absolutely _NO_ difference between these two expressions.

The _only_ thing about _Bool is that it magically casts values to 0,1.
Single bit bitfield variables have no choice but to already be in that
range.

So expressions where it matters are:

x = (7&2) // x == 2
vs
x = !!(7&2) // x == 1

But it is impossible for int:1 and _Bool to behave differently.

> However, in this patch bitfields are unnecessary and they result in
> worse code from the compiler.

Fully agreed :-)

2019-10-09 09:24:30

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On 09/10/19 10:16, Peter Zijlstra wrote:
> On Wed, Oct 09, 2019 at 09:15:03AM +0200, Paolo Bonzini wrote:
>> For stuff like hardware registers, bitfields are probably a bad idea
>> anyway, so let's only consider the case of space optimization.
>
> Except for hardware registers? I actually like bitfields to describe
> hardware registers.

In theory yes, in practice for MMIO it's a problem that you're not able
to see the exact compiler reads or writes. Of course you can do:

union {
struct {
/* some bitfields here
} u;
u32 val;
}

and only use the bitfields after reading/writing from the register.

> But worse, as used in the parent thread:
>
> u8 count:7;
> bool flag:1;
>
> Who says the @flag thing will even be the msb of the initial u8 and not
> a whole new variable due to change in base type?

Good point.

>> bool bitfields preserve the magic behavior where something like this:
>>
>> foo->x = y;
>>
>> (x is a bool bitfield) would be compiled as
>>
>> foo->x = (y != 0);
>
> This is confusion; if y is a single bit bitfield, then there is
> absolutely _NO_ difference between these two expressions.

y is not in a struct so it cannot be a single bit bitfield. :) If y is
an int and foo->x is a bool bitfield, you get the following:

foo->x = 6; /* foo->x is 1, it would be 0 for int:1 */
foo->x = 7; /* foo->x is 1, it would be 1 for int:1 */

Anyway it's good that we agree on the important thing about the patch!

Paolo

> The _only_ thing about _Bool is that it magically casts values to 0,1.
> Single bit bitfield variables have no choice but to already be in that
> range.

2019-10-09 09:32:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

On Wed, Oct 09, 2019 at 11:21:30AM +0200, Paolo Bonzini wrote:
> On 09/10/19 10:16, Peter Zijlstra wrote:

> >> bool bitfields preserve the magic behavior where something like this:
> >>
> >> foo->x = y;
> >>
> >> (x is a bool bitfield) would be compiled as
> >>
> >> foo->x = (y != 0);
> >
> > This is confusion; if y is a single bit bitfield, then there is
> > absolutely _NO_ difference between these two expressions.
>
> y is not in a struct so it cannot be a single bit bitfield. :) If y is
> an int and foo->x is a bool bitfield, you get the following:
>
> foo->x = 6; /* foo->x is 1, it would be 0 for int:1 */
> foo->x = 7; /* foo->x is 1, it would be 1 for int:1 */
>

Urgh, reading hard. You're right!