Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.
However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. An example of
this is the addition of exclude_host and exclude_guest in 2011 yet many
PMU drivers do not support this or indicate an error on events that make
use of it.
This patch generalises the test for exclusion and updates PMU drivers to
use it. This is a functional change as some PMU drivers will now correctly
report that they don't support certain events whereas they previously did.
An alternative approach might have been to provide a static perf_event_attr
with exclusion events set such that each PMU driver could compare against
(see ref [1]) - however this is less readable. A longer term approach may
instead be for PMU's to advertise their capabilities on registration.
All drivers touched by this patchset have been compile tested.
[1] https://lore.kernel.org/patchwork/patch/325116/
~
Andrew Murray (10):
perf/core: Add macro to test for event exclusion flags
arm: perf/core: generalise event exclusion checking with perf macro
arm: perf: add additional validation to set_event_filter
powerpc: perf/core: generalise event exclusion checking with perf
macro
powerpc/pmu/fsl: add additional validation to event_init
alpha: perf/core: generalise event exclusion checking with perf macro
x86: perf/core: generalise event exclusion checking with perf macro
perf/core: Remove unused perf_flags
drivers/perf: perf/core: generalise event exclusion checking with perf
macro
perf/doc: update design.txt for exclude_{host|guest} flags
arch/alpha/kernel/perf_event.c | 4 +---
arch/arm/kernel/perf_event_v7.c | 2 ++
arch/arm/mach-imx/mmdc.c | 8 +-------
arch/arm/mm/cache-l2x0-pmu.c | 7 +------
arch/powerpc/perf/core-fsl-emb.c | 2 ++
arch/powerpc/perf/hv-24x7.c | 7 +------
arch/powerpc/perf/hv-gpci.c | 7 +------
arch/powerpc/perf/imc-pmu.c | 14 ++------------
arch/x86/events/amd/ibs.c | 11 +----------
arch/x86/events/amd/iommu.c | 3 +--
arch/x86/events/amd/power.c | 8 +-------
arch/x86/events/amd/uncore.c | 3 +--
arch/x86/events/intel/cstate.c | 7 +------
arch/x86/events/intel/rapl.c | 7 +------
arch/x86/events/intel/uncore.c | 3 +--
arch/x86/events/intel/uncore_snb.c | 7 +------
arch/x86/events/msr.c | 7 +------
drivers/perf/arm-cci.c | 7 +------
drivers/perf/arm-ccn.c | 5 +----
drivers/perf/arm_dsu_pmu.c | 7 +------
drivers/perf/arm_pmu.c | 9 +--------
drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +------
drivers/perf/qcom_l2_pmu.c | 3 +--
drivers/perf/qcom_l3_pmu.c | 3 +--
drivers/perf/xgene_pmu.c | 3 +--
include/linux/perf_event.h | 9 +++++++++
include/uapi/linux/perf_event.h | 2 --
tools/include/uapi/linux/perf_event.h | 2 --
tools/perf/design.txt | 4 ++++
29 files changed, 41 insertions(+), 127 deletions(-)
--
2.7.4
Add a macro that tests if any of the perf event exclusion flags
are set on a given event.
Signed-off-by: Andrew Murray <[email protected]>
---
include/linux/perf_event.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f..89ee7fa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
extern void
perf_log_lost_samples(struct perf_event *event, u64 lost);
+static inline bool event_has_exclude_flags(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+
+ return attr->exclude_idle || attr->exclude_user ||
+ attr->exclude_kernel || attr->exclude_hv ||
+ attr->exclude_guest || attr->exclude_host;
+}
+
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
--
2.7.4
Replace checking of perf event exclusion flags with perf macro.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/arm/mach-imx/mmdc.c | 8 +-------
arch/arm/mm/cache-l2x0-pmu.c | 7 +------
2 files changed, 2 insertions(+), 13 deletions(-)
diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 04b3bf7..d9d468f 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period)
+ if (event_has_exclude_flags(event) || event->attr.sample_period)
return -EINVAL;
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..968fdf8 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,12 +314,7 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (event->cpu < 0)
--
2.7.4
Replace checking of perf event exclusion flags with perf macro.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/powerpc/perf/hv-24x7.c | 7 +------
arch/powerpc/perf/hv-gpci.c | 7 +------
arch/powerpc/perf/imc-pmu.c | 14 ++------------
3 files changed, 4 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..60db22d 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1307,12 +1307,7 @@ static int h_24x7_event_init(struct perf_event *event)
}
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/* no branch sampling */
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..2d2b5c0 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -233,12 +233,7 @@ static int h_gpci_event_init(struct perf_event *event)
}
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/* no branch sampling */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1fafc32b..1ae1d3f 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -474,12 +474,7 @@ static int nest_imc_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (event->cpu < 0)
@@ -749,12 +744,7 @@ static int core_imc_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (event->cpu < 0)
--
2.7.4
The fsl PMU driver doesn't support host/guest mode exclusion so
let's report this when event_init is called with these exclusion
flags set.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/powerpc/perf/core-fsl-emb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index ba48584..b1cc9d1 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -559,6 +559,8 @@ static int fsl_emb_pmu_event_init(struct perf_event *event)
event->hw.config_base |= PMLCA_FCS;
if (event->attr.exclude_idle)
return -ENOTSUPP;
+ if (event->attr.exclude_host || event->attr.exclude_guest)
+ return -ENOTSUPP;
event->hw.last_period = event->hw.sample_period;
local64_set(&event->hw.period_left, event->hw.last_period);
--
2.7.4
Replace checking of perf event exclusion flags with perf macro.
This is a functional change as __hw_perf_event_init will now indicate
that it doesn't support exclude_host and exclude_guest event flags.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/alpha/kernel/perf_event.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..36f98ee 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -631,10 +631,8 @@ static int __hw_perf_event_init(struct perf_event *event)
}
/* The EV67 does not support mode exclusion */
- if (attr->exclude_kernel || attr->exclude_user
- || attr->exclude_hv || attr->exclude_idle) {
+ if (event_has_exclude_flags(event))
return -EPERM;
- }
/*
* We place the event type in event_base here and leave calculation
--
2.7.4
Replace checking of perf event exclusion flags with perf macro.
This is a functional change as exclude_host and exclude_guest are
added in these files:
arch/x86/events/intel/uncore.c
and exclude_idle and exclude_hv are added in these files:
arch/x86/events/amd/iommu.c
arch/x86/events/amd/uncore.c
Signed-off-by: Andrew Murray <[email protected]>
---
arch/x86/events/amd/ibs.c | 11 +----------
arch/x86/events/amd/iommu.c | 3 +--
arch/x86/events/amd/power.c | 8 +-------
arch/x86/events/amd/uncore.c | 3 +--
arch/x86/events/intel/cstate.c | 7 +------
arch/x86/events/intel/rapl.c | 7 +------
arch/x86/events/intel/uncore.c | 3 +--
arch/x86/events/intel/uncore_snb.c | 7 +------
arch/x86/events/msr.c | 7 +------
9 files changed, 9 insertions(+), 47 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..a51981c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
return -EOPNOTSUPP;
}
-static const struct perf_event_attr ibs_notsupp = {
- .exclude_user = 1,
- .exclude_kernel = 1,
- .exclude_hv = 1,
- .exclude_idle = 1,
- .exclude_host = 1,
- .exclude_guest = 1,
-};
-
static int perf_ibs_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -282,7 +273,7 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
- if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (config & ~perf_ibs->config_mask)
diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..fa7541b 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -224,8 +224,7 @@ static int perf_iommu_event_init(struct perf_event *event)
return -EINVAL;
/* IOMMU counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (event->cpu < 0)
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..4129fbe 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,13 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
/* Unsupported modes and filters. */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- /* no sampling */
+ if (event_has_exclude_flags(event) ||
event->attr.sample_period)
return -EINVAL;
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 8671de1..c2015c7 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -202,8 +202,7 @@ static int amd_uncore_event_init(struct perf_event *event)
return -EINVAL;
/* NB and Last level cache counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/* and we do not enable counter overflow interrupts */
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 9f8084f..9366833 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,12 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
+ if (event_has_exclude_flags(event) ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 32f3e94..428d40c 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,12 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
+ if (event_has_exclude_flags(event) ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..0544100 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -699,8 +699,7 @@ static int uncore_pmu_event_init(struct perf_event *event)
* Uncore PMU does measure at all privilege level all the time.
* So it doesn't make sense to specify any exclude bits.
*/
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/* Sampling not supported yet */
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 8527c3e..8bd1727 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -374,12 +374,7 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
+ if (event_has_exclude_flags(event) ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index b4771a6..5f1b50b 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -160,12 +160,7 @@ static int msr_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
+ if (event_has_exclude_flags(event) ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
--
2.7.4
The armv7pmu driver doesn't support host/guest mode exclusion so
let's report this when set_event_filter is called with these
exclusion flags set.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/arm/kernel/perf_event_v7.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index a4fb0f8..c4c9fbb 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1074,6 +1074,8 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event,
if (attr->exclude_idle)
return -EPERM;
+ if (attr->exclude_host || attr->exclude_guest)
+ return -EPERM;
if (attr->exclude_user)
config_base |= ARMV7_EXCLUDE_USER;
if (attr->exclude_kernel)
--
2.7.4
Now that perf_flags is not used we remove it.
Signed-off-by: Andrew Murray <[email protected]>
---
include/uapi/linux/perf_event.h | 2 --
tools/include/uapi/linux/perf_event.h | 2 --
2 files changed, 4 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32 ids[0];
};
-#define perf_flags(attr) (*(&(attr)->read_format + 1))
-
/*
* Ioctls that can be done on a perf event fd:
*/
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32 ids[0];
};
-#define perf_flags(attr) (*(&(attr)->read_format + 1))
-
/*
* Ioctls that can be done on a perf event fd:
*/
--
2.7.4
Replace checking of perf event exclusion flags with perf macro.
This is a functional change as exclude_host and exclude_guest are added
in the following files:
- drivers/perf/qcom_l2_pmu.c
- drivers/perf/qcom_l3_pmu.c
- drivers/perf/arm_pmu.c
And exclude_idle and exclude_hv are added in these files:
- drivers/perf/xgene_pmu.c
Signed-off-by: Andrew Murray <[email protected]>
---
drivers/perf/arm-cci.c | 7 +------
drivers/perf/arm-ccn.c | 5 +----
drivers/perf/arm_dsu_pmu.c | 7 +------
drivers/perf/arm_pmu.c | 9 +--------
drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +------
drivers/perf/qcom_l2_pmu.c | 3 +--
drivers/perf/qcom_l3_pmu.c | 3 +--
drivers/perf/xgene_pmu.c | 3 +--
8 files changed, 8 insertions(+), 36 deletions(-)
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..d749f19 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
/* We have no filtering of any kind */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/*
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..9a22a95 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- if (has_branch_stack(event) || event->attr.exclude_user ||
- event->attr.exclude_kernel || event->attr.exclude_hv ||
- event->attr.exclude_idle || event->attr.exclude_host ||
- event->attr.exclude_guest) {
+ if (has_branch_stack(event) || event_has_exclude_flags(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..300ff3d 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
}
if (has_branch_stack(event) ||
- event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest) {
+ event_has_exclude_flags(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 7f01f6f..a03634f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
}
static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
- return attr->exclude_idle || attr->exclude_user ||
- attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
__hw_perf_event_init(struct perf_event *event)
{
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event)
*/
if ((!armpmu->set_event_filter ||
armpmu->set_event_filter(hwc, &event->attr)) &&
- event_requires_mode_exclusion(&event->attr)) {
+ event_has_exclude_flags(event)) {
pr_debug("ARM performance counters do not support "
"mode exclusion\n");
return -EOPNOTSUPP;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 9efd241..d3edff9 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
/* counters do not have these bits */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/*
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..d7d85a2 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -510,8 +510,7 @@ static int l2_cache_event_init(struct perf_event *event)
}
/* We cannot filter accurately so we just don't allow it. */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle) {
+ if (event_has_exclude_flags(event)) {
dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
"Can't exclude execution levels\n");
return -EOPNOTSUPP;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..f27af67 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -497,8 +497,7 @@ static int qcom_l3_cache__event_init(struct perf_event *event)
/*
* There are no per-counter mode filters in the PMU.
*/
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle)
+ if (event_has_exclude_flags(event))
return -EINVAL;
/*
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0e31f13..1fcd5a0 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -915,8 +915,7 @@ static int xgene_perf_event_init(struct perf_event *event)
return -EINVAL;
/* SOC counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
+ if (event_has_exclude_flags(event))
return -EINVAL;
if (event->cpu < 0)
--
2.7.4
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.
Signed-off-by: Andrew Murray <[email protected]>
---
tools/perf/design.txt | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..7de7d83 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
way to request that counting of events be restricted to times when the
CPU is in user, kernel and/or hypervisor mode.
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using virtualisation.
+
The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
operations, these can be used to relate userspace IP addresses to actual
code, even after the mapping (or even the whole process) is gone,
--
2.7.4
On Fri, Nov 16, 2018 at 10:24:04AM +0000, Andrew Murray wrote:
> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.
It is in fact an inline function, not a CPP macro.
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> include/linux/perf_event.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
> extern void
> perf_log_lost_samples(struct perf_event *event, u64 lost);
>
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> + attr->exclude_kernel || attr->exclude_hv ||
> + attr->exclude_guest || attr->exclude_host;
> +}
> +
> static inline bool is_sampling_event(struct perf_event *event)
> {
> return event->attr.sample_period != 0;
> --
> 2.7.4
>
On Fri, Nov 16, 2018 at 10:24:03AM +0000, Andrew Murray wrote:
> Many PMU drivers do not have the capability to exclude counting events
> that occur in specific contexts such as idle, kernel, guest, etc. These
> drivers indicate this by returning an error in their event_init upon
> testing the events attribute flags.
>
> However this approach requires that each time a new event modifier is
> added to perf, all the perf drivers need to be modified to indicate that
> they don't support the attribute. This results in additional boiler-plate
> code common to many drivers that needs to be maintained. An example of
> this is the addition of exclude_host and exclude_guest in 2011 yet many
> PMU drivers do not support this or indicate an error on events that make
> use of it.
>
> This patch generalises the test for exclusion and updates PMU drivers to
> use it. This is a functional change as some PMU drivers will now correctly
> report that they don't support certain events whereas they previously did.
Right, I like that idea, and yes, there's a lot of fail around there :/
> A longer term approach may instead be for PMU's to advertise their
> capabilities on registration.
This I think is the better approach. We already have the
PERF_PMU_CAP_flags that can be used to advertise various PMU
capabilities.
Something along these lines I suppose; then every PMU that actually
checks the flags, needs to set the flag, otherwise it'll fail.
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..de15723ea52a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
#define PERF_PMU_CAP_EXCLUSIVE 0x10
#define PERF_PMU_CAP_ITRACE 0x20
#define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40
+#define PERF_PMU_CAP_EXCLUDE 0x80
/**
* struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 84530ab358c3..d76b724177b9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
+ if (!ret) {
+ if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) ||
+ event_has_exclude_flags(event)) {
+ event->destroy(event);
+ ret = -EINVAL;
+ }
+ }
+
if (ret)
module_put(pmu->module);
On Fri, Nov 16, 2018 at 10:24:04AM +0000, Andrew Murray wrote:
> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.
>
> Signed-off-by: Andrew Murray <[email protected]>
Aside from the s/macro/function, or s/macro/helper/, this looks sound to
me.
Assuming you fix that up here and in subsequent commit messages, for
this patch feel free to add:
Acked-by: Mark Rutland <[email protected]>
Mark.
> ---
> include/linux/perf_event.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
> extern void
> perf_log_lost_samples(struct perf_event *event, u64 lost);
>
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> + attr->exclude_kernel || attr->exclude_hv ||
> + attr->exclude_guest || attr->exclude_host;
> +}
> +
> static inline bool is_sampling_event(struct perf_event *event)
> {
> return event->attr.sample_period != 0;
> --
> 2.7.4
>
On Fri, Nov 16, 2018 at 10:24:12AM +0000, Andrew Murray wrote:
> Replace checking of perf event exclusion flags with perf macro.
>
> This is a functional change as exclude_host and exclude_guest are added
> in the following files:
>
> - drivers/perf/qcom_l2_pmu.c
> - drivers/perf/qcom_l3_pmu.c
> - drivers/perf/arm_pmu.c
>
> And exclude_idle and exclude_hv are added in these files:
>
> - drivers/perf/xgene_pmu.c
FWIW, that all sounds fine to me.
Assuming you fix up the 'macro' nit:
Acked-by: Mark Rutland <[email protected]>
... unless we go for Peter's core cap for this.
Thanks,
Mark.
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> drivers/perf/arm-cci.c | 7 +------
> drivers/perf/arm-ccn.c | 5 +----
> drivers/perf/arm_dsu_pmu.c | 7 +------
> drivers/perf/arm_pmu.c | 9 +--------
> drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +------
> drivers/perf/qcom_l2_pmu.c | 3 +--
> drivers/perf/qcom_l3_pmu.c | 3 +--
> drivers/perf/xgene_pmu.c | 3 +--
> 8 files changed, 8 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 1bfeb16..d749f19 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event)
> return -EOPNOTSUPP;
>
> /* We have no filtering of any kind */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> /*
> diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
> index 7dd850e..9a22a95 100644
> --- a/drivers/perf/arm-ccn.c
> +++ b/drivers/perf/arm-ccn.c
> @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
> return -EOPNOTSUPP;
> }
>
> - if (has_branch_stack(event) || event->attr.exclude_user ||
> - event->attr.exclude_kernel || event->attr.exclude_hv ||
> - event->attr.exclude_idle || event->attr.exclude_host ||
> - event->attr.exclude_guest) {
> + if (has_branch_stack(event) || event_has_exclude_flags(event)) {
> dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
> return -EINVAL;
> }
> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> index 660cb8a..300ff3d 100644
> --- a/drivers/perf/arm_dsu_pmu.c
> +++ b/drivers/perf/arm_dsu_pmu.c
> @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
> }
>
> if (has_branch_stack(event) ||
> - event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest) {
> + event_has_exclude_flags(event)) {
> dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
> return -EINVAL;
> }
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 7f01f6f..a03634f 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
> }
>
> static int
> -event_requires_mode_exclusion(struct perf_event_attr *attr)
> -{
> - return attr->exclude_idle || attr->exclude_user ||
> - attr->exclude_kernel || attr->exclude_hv;
> -}
> -
> -static int
> __hw_perf_event_init(struct perf_event *event)
> {
> struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event)
> */
> if ((!armpmu->set_event_filter ||
> armpmu->set_event_filter(hwc, &event->attr)) &&
> - event_requires_mode_exclusion(&event->attr)) {
> + event_has_exclude_flags(event)) {
> pr_debug("ARM performance counters do not support "
> "mode exclusion\n");
> return -EOPNOTSUPP;
> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> index 9efd241..d3edff9 100644
> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> @@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
> return -EOPNOTSUPP;
>
> /* counters do not have these bits */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> /*
> diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
> index 842135c..d7d85a2 100644
> --- a/drivers/perf/qcom_l2_pmu.c
> +++ b/drivers/perf/qcom_l2_pmu.c
> @@ -510,8 +510,7 @@ static int l2_cache_event_init(struct perf_event *event)
> }
>
> /* We cannot filter accurately so we just don't allow it. */
> - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> - event->attr.exclude_hv || event->attr.exclude_idle) {
> + if (event_has_exclude_flags(event)) {
> dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
> "Can't exclude execution levels\n");
> return -EOPNOTSUPP;
> diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
> index 2dc63d6..f27af67 100644
> --- a/drivers/perf/qcom_l3_pmu.c
> +++ b/drivers/perf/qcom_l3_pmu.c
> @@ -497,8 +497,7 @@ static int qcom_l3_cache__event_init(struct perf_event *event)
> /*
> * There are no per-counter mode filters in the PMU.
> */
> - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> - event->attr.exclude_hv || event->attr.exclude_idle)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> /*
> diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
> index 0e31f13..1fcd5a0 100644
> --- a/drivers/perf/xgene_pmu.c
> +++ b/drivers/perf/xgene_pmu.c
> @@ -915,8 +915,7 @@ static int xgene_perf_event_init(struct perf_event *event)
> return -EINVAL;
>
> /* SOC counters do not have usr/os/guest/host bits */
> - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> - event->attr.exclude_host || event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> if (event->cpu < 0)
> --
> 2.7.4
>
Andrew Murray <[email protected]> writes:
> Update design.txt to reflect the presence of the exclude_host
> and exclude_guest perf flags.
>
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> tools/perf/design.txt | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> index a28dca2..7de7d83 100644
> --- a/tools/perf/design.txt
> +++ b/tools/perf/design.txt
> @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> way to request that counting of events be restricted to times when the
> CPU is in user, kernel and/or hypervisor mode.
>
> +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> +to request counting of events restricted to guest and host contexts when
> +using virtualisation.
How does exclude_host differ from exclude_hv ?
cheers
Andrew Murray <[email protected]> writes:
> Replace checking of perf event exclusion flags with perf macro.
>
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> arch/powerpc/perf/hv-24x7.c | 7 +------
> arch/powerpc/perf/hv-gpci.c | 7 +------
> arch/powerpc/perf/imc-pmu.c | 14 ++------------
> 3 files changed, 4 insertions(+), 24 deletions(-)
These conversions look fine, thanks.
Acked-by: Michael Ellerman <[email protected]>
cheers
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 72238ee..60db22d 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -1307,12 +1307,7 @@ static int h_24x7_event_init(struct perf_event *event)
> }
>
> /* unsupported modes and filters */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> /* no branch sampling */
> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> index 43fabb3..2d2b5c0 100644
> --- a/arch/powerpc/perf/hv-gpci.c
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -233,12 +233,7 @@ static int h_gpci_event_init(struct perf_event *event)
> }
>
> /* unsupported modes and filters */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> /* no branch sampling */
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 1fafc32b..1ae1d3f 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -474,12 +474,7 @@ static int nest_imc_event_init(struct perf_event *event)
> return -EINVAL;
>
> /* unsupported modes and filters */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> if (event->cpu < 0)
> @@ -749,12 +744,7 @@ static int core_imc_event_init(struct perf_event *event)
> return -EINVAL;
>
> /* unsupported modes and filters */
> - if (event->attr.exclude_user ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle ||
> - event->attr.exclude_host ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
> return -EINVAL;
>
> if (event->cpu < 0)
> --
> 2.7.4
On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> Andrew Murray <[email protected]> writes:
>
> > Update design.txt to reflect the presence of the exclude_host
> > and exclude_guest perf flags.
> >
> > Signed-off-by: Andrew Murray <[email protected]>
> > ---
> > tools/perf/design.txt | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > index a28dca2..7de7d83 100644
> > --- a/tools/perf/design.txt
> > +++ b/tools/perf/design.txt
> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > way to request that counting of events be restricted to times when the
> > CPU is in user, kernel and/or hypervisor mode.
> >
> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > +to request counting of events restricted to guest and host contexts when
> > +using virtualisation.
>
> How does exclude_host differ from exclude_hv ?
I believe exclude_host / exclude_guest are intented to distinguish
between host and guest in the hosted hypervisor context (KVM).
Whereas exclude_hv allows to distinguish between guest and
hypervisor in the bare-metal type hypervisors.
In the case of arm64 - if VHE extensions are present then the host
kernel will run at a higher privilege to the guest kernel, in which
case there is no distinction between hypervisor and host so we ignore
exclude_hv. But where VHE extensions are not present then the host
kernel runs at the same privilege level as the guest and we use a
higher privilege level to switch between them - in this case we can
use exclude_hv to discount that hypervisor role of switching between
guests.
Thanks,
Andrew Murray
>
> cheers
Andrew Murray <[email protected]> writes:
> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.
>
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> include/linux/perf_event.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
> extern void
> perf_log_lost_samples(struct perf_event *event, u64 lost);
>
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> + attr->exclude_kernel || attr->exclude_hv ||
> + attr->exclude_guest || attr->exclude_host;
> +}
Sorry to be a total PITA, but using "flags" plural suggests that it only
returns true if there is more than one exclude flag set.
A better name would be event_has_exclude_flag() or maybe
event_has_any_exclude_flag().
If you're doing a respin anyway it'd be nice to fix the name, but
obviously it's not high priority.
cheers
On Thu, Nov 22, 2018 at 12:21:43PM +0000, Andrew Murray wrote:
> On Mon, Nov 19, 2018 at 02:08:00PM +0100, Peter Zijlstra wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 84530ab358c3..d76b724177b9 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
> > if (ctx)
> > perf_event_ctx_unlock(event->group_leader, ctx);
> >
> > + if (!ret) {
> > + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) ||
> > + event_has_exclude_flags(event)) {
> > + event->destroy(event);
> > + ret = -EINVAL;
> > + }
> > + }
> > +
>
> I don't quite follow this logic. Should that not have been:
>
> if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
> event_has_exclude_flags(event)) {
>
> Meaning that if an event has any exclude flags but the pmu doesn't
> have the capability to handle them then error.
Uhm, yes. Brainfart on my side that.
> If you're happy with my proposed logic, then would it also make
> sense to move this before the call to the pmu->event_init ?
I'm not sure that can work; I think we need ->event_init() first such
that it can -ENOENT. Only after ->event_init() returns success can we be
certain of @pmu.
On Mon, Nov 19, 2018 at 02:08:00PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 16, 2018 at 10:24:03AM +0000, Andrew Murray wrote:
> > Many PMU drivers do not have the capability to exclude counting events
> > that occur in specific contexts such as idle, kernel, guest, etc. These
> > drivers indicate this by returning an error in their event_init upon
> > testing the events attribute flags.
> >
> > However this approach requires that each time a new event modifier is
> > added to perf, all the perf drivers need to be modified to indicate that
> > they don't support the attribute. This results in additional boiler-plate
> > code common to many drivers that needs to be maintained. An example of
> > this is the addition of exclude_host and exclude_guest in 2011 yet many
> > PMU drivers do not support this or indicate an error on events that make
> > use of it.
> >
> > This patch generalises the test for exclusion and updates PMU drivers to
> > use it. This is a functional change as some PMU drivers will now correctly
> > report that they don't support certain events whereas they previously did.
>
> Right, I like that idea, and yes, there's a lot of fail around there :/
>
> > A longer term approach may instead be for PMU's to advertise their
> > capabilities on registration.
>
> This I think is the better approach. We already have the
> PERF_PMU_CAP_flags that can be used to advertise various PMU
> capabilities.
OK I'll respin my series to take this approach.
>
> Something along these lines I suppose; then every PMU that actually
> checks the flags, needs to set the flag, otherwise it'll fail.
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f0ca79..de15723ea52a 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -244,6 +244,7 @@ struct perf_event;
> #define PERF_PMU_CAP_EXCLUSIVE 0x10
> #define PERF_PMU_CAP_ITRACE 0x20
> #define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40
> +#define PERF_PMU_CAP_EXCLUDE 0x80
>
> /**
> * struct pmu - generic performance monitoring unit
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 84530ab358c3..d76b724177b9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
> if (ctx)
> perf_event_ctx_unlock(event->group_leader, ctx);
>
> + if (!ret) {
> + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) ||
> + event_has_exclude_flags(event)) {
> + event->destroy(event);
> + ret = -EINVAL;
> + }
> + }
> +
I don't quite follow this logic. Should that not have been:
if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
event_has_exclude_flags(event)) {
Meaning that if an event has any exclude flags but the pmu doesn't
have the capability to handle them then error.
If you're happy with my proposed logic, then would it also make
sense to move this before the call to the pmu->event_init ?
Thanks,
Andrew Murray
> if (ret)
> module_put(pmu->module);
>
>
On Thu, Nov 22, 2018 at 01:26:37PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 22, 2018 at 12:21:43PM +0000, Andrew Murray wrote:
> > On Mon, Nov 19, 2018 at 02:08:00PM +0100, Peter Zijlstra wrote:
>
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 84530ab358c3..d76b724177b9 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -9772,6 +9772,14 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
> > > if (ctx)
> > > perf_event_ctx_unlock(event->group_leader, ctx);
> > >
> > > + if (!ret) {
> > > + if ((pmu->capabilities & PERF_PMU_CAP_EXCLUDE) ||
> > > + event_has_exclude_flags(event)) {
> > > + event->destroy(event);
> > > + ret = -EINVAL;
> > > + }
> > > + }
> > > +
> >
> > I don't quite follow this logic. Should that not have been:
> >
> > if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUDE) &&
> > event_has_exclude_flags(event)) {
> >
> > Meaning that if an event has any exclude flags but the pmu doesn't
> > have the capability to handle them then error.
>
> Uhm, yes. Brainfart on my side that.
>
> > If you're happy with my proposed logic, then would it also make
> > sense to move this before the call to the pmu->event_init ?
>
> I'm not sure that can work; I think we need ->event_init() first such
> that it can -ENOENT. Only after ->event_init() returns success can we be
> certain of @pmu.
Ah yes I see now. Until event_init doesn't return -ENOENT we can't be sure
that this will be the PMU we use (as per the other calls to
perf_try_init_event in perf_init_event).
Thanks,
Andrew Murray
On Mon, Nov 19, 2018 at 04:03:52PM +0000, Mark Rutland wrote:
> On Fri, Nov 16, 2018 at 10:24:12AM +0000, Andrew Murray wrote:
> > Replace checking of perf event exclusion flags with perf macro.
> >
> > This is a functional change as exclude_host and exclude_guest are added
> > in the following files:
> >
> > - drivers/perf/qcom_l2_pmu.c
> > - drivers/perf/qcom_l3_pmu.c
> > - drivers/perf/arm_pmu.c
> >
> > And exclude_idle and exclude_hv are added in these files:
> >
> > - drivers/perf/xgene_pmu.c
>
> FWIW, that all sounds fine to me.
>
> Assuming you fix up the 'macro' nit:
>
> Acked-by: Mark Rutland <[email protected]>
>
> ... unless we go for Peter's core cap for this.
Yeah I'm going to re-spin with the core cap approach - but thanks anyway.
Thanks,
Andrew Murray
>
> Thanks,
> Mark.
>
> > Signed-off-by: Andrew Murray <[email protected]>
> > ---
> > drivers/perf/arm-cci.c | 7 +------
> > drivers/perf/arm-ccn.c | 5 +----
> > drivers/perf/arm_dsu_pmu.c | 7 +------
> > drivers/perf/arm_pmu.c | 9 +--------
> > drivers/perf/hisilicon/hisi_uncore_pmu.c | 7 +------
> > drivers/perf/qcom_l2_pmu.c | 3 +--
> > drivers/perf/qcom_l3_pmu.c | 3 +--
> > drivers/perf/xgene_pmu.c | 3 +--
> > 8 files changed, 8 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> > index 1bfeb16..d749f19 100644
> > --- a/drivers/perf/arm-cci.c
> > +++ b/drivers/perf/arm-cci.c
> > @@ -1328,12 +1328,7 @@ static int cci_pmu_event_init(struct perf_event *event)
> > return -EOPNOTSUPP;
> >
> > /* We have no filtering of any kind */
> > - if (event->attr.exclude_user ||
> > - event->attr.exclude_kernel ||
> > - event->attr.exclude_hv ||
> > - event->attr.exclude_idle ||
> > - event->attr.exclude_host ||
> > - event->attr.exclude_guest)
> > + if (event_has_exclude_flags(event))
> > return -EINVAL;
> >
> > /*
> > diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
> > index 7dd850e..9a22a95 100644
> > --- a/drivers/perf/arm-ccn.c
> > +++ b/drivers/perf/arm-ccn.c
> > @@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
> > return -EOPNOTSUPP;
> > }
> >
> > - if (has_branch_stack(event) || event->attr.exclude_user ||
> > - event->attr.exclude_kernel || event->attr.exclude_hv ||
> > - event->attr.exclude_idle || event->attr.exclude_host ||
> > - event->attr.exclude_guest) {
> > + if (has_branch_stack(event) || event_has_exclude_flags(event)) {
> > dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
> > return -EINVAL;
> > }
> > diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> > index 660cb8a..300ff3d 100644
> > --- a/drivers/perf/arm_dsu_pmu.c
> > +++ b/drivers/perf/arm_dsu_pmu.c
> > @@ -563,12 +563,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
> > }
> >
> > if (has_branch_stack(event) ||
> > - event->attr.exclude_user ||
> > - event->attr.exclude_kernel ||
> > - event->attr.exclude_hv ||
> > - event->attr.exclude_idle ||
> > - event->attr.exclude_host ||
> > - event->attr.exclude_guest) {
> > + event_has_exclude_flags(event)) {
> > dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
> > return -EINVAL;
> > }
> > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > index 7f01f6f..a03634f 100644
> > --- a/drivers/perf/arm_pmu.c
> > +++ b/drivers/perf/arm_pmu.c
> > @@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
> > }
> >
> > static int
> > -event_requires_mode_exclusion(struct perf_event_attr *attr)
> > -{
> > - return attr->exclude_idle || attr->exclude_user ||
> > - attr->exclude_kernel || attr->exclude_hv;
> > -}
> > -
> > -static int
> > __hw_perf_event_init(struct perf_event *event)
> > {
> > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > @@ -395,7 +388,7 @@ __hw_perf_event_init(struct perf_event *event)
> > */
> > if ((!armpmu->set_event_filter ||
> > armpmu->set_event_filter(hwc, &event->attr)) &&
> > - event_requires_mode_exclusion(&event->attr)) {
> > + event_has_exclude_flags(event)) {
> > pr_debug("ARM performance counters do not support "
> > "mode exclusion\n");
> > return -EOPNOTSUPP;
> > diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > index 9efd241..d3edff9 100644
> > --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> > @@ -143,12 +143,7 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
> > return -EOPNOTSUPP;
> >
> > /* counters do not have these bits */
> > - if (event->attr.exclude_user ||
> > - event->attr.exclude_kernel ||
> > - event->attr.exclude_host ||
> > - event->attr.exclude_guest ||
> > - event->attr.exclude_hv ||
> > - event->attr.exclude_idle)
> > + if (event_has_exclude_flags(event))
> > return -EINVAL;
> >
> > /*
> > diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
> > index 842135c..d7d85a2 100644
> > --- a/drivers/perf/qcom_l2_pmu.c
> > +++ b/drivers/perf/qcom_l2_pmu.c
> > @@ -510,8 +510,7 @@ static int l2_cache_event_init(struct perf_event *event)
> > }
> >
> > /* We cannot filter accurately so we just don't allow it. */
> > - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > - event->attr.exclude_hv || event->attr.exclude_idle) {
> > + if (event_has_exclude_flags(event)) {
> > dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
> > "Can't exclude execution levels\n");
> > return -EOPNOTSUPP;
> > diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
> > index 2dc63d6..f27af67 100644
> > --- a/drivers/perf/qcom_l3_pmu.c
> > +++ b/drivers/perf/qcom_l3_pmu.c
> > @@ -497,8 +497,7 @@ static int qcom_l3_cache__event_init(struct perf_event *event)
> > /*
> > * There are no per-counter mode filters in the PMU.
> > */
> > - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > - event->attr.exclude_hv || event->attr.exclude_idle)
> > + if (event_has_exclude_flags(event))
> > return -EINVAL;
> >
> > /*
> > diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
> > index 0e31f13..1fcd5a0 100644
> > --- a/drivers/perf/xgene_pmu.c
> > +++ b/drivers/perf/xgene_pmu.c
> > @@ -915,8 +915,7 @@ static int xgene_perf_event_init(struct perf_event *event)
> > return -EINVAL;
> >
> > /* SOC counters do not have usr/os/guest/host bits */
> > - if (event->attr.exclude_user || event->attr.exclude_kernel ||
> > - event->attr.exclude_host || event->attr.exclude_guest)
> > + if (event_has_exclude_flags(event))
> > return -EINVAL;
> >
> > if (event->cpu < 0)
> > --
> > 2.7.4
> >
On Tue, Nov 20, 2018 at 10:28:34PM +1100, Michael Ellerman wrote:
> Andrew Murray <[email protected]> writes:
>
> > Add a macro that tests if any of the perf event exclusion flags
> > are set on a given event.
> >
> > Signed-off-by: Andrew Murray <[email protected]>
> > ---
> > include/linux/perf_event.h | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 53c500f..89ee7fa 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
> > extern void
> > perf_log_lost_samples(struct perf_event *event, u64 lost);
> >
> > +static inline bool event_has_exclude_flags(struct perf_event *event)
> > +{
> > + struct perf_event_attr *attr = &event->attr;
> > +
> > + return attr->exclude_idle || attr->exclude_user ||
> > + attr->exclude_kernel || attr->exclude_hv ||
> > + attr->exclude_guest || attr->exclude_host;
> > +}
>
> Sorry to be a total PITA, but using "flags" plural suggests that it only
> returns true if there is more than one exclude flag set.
>
> A better name would be event_has_exclude_flag() or maybe
> event_has_any_exclude_flag().
>
> If you're doing a respin anyway it'd be nice to fix the name, but
> obviously it's not high priority.
No problem - I'll go with event_has_any_exclude_flag.
Thanks,
Andrew Murray
>
> cheers
[ Reviving old thread. ]
Andrew Murray <[email protected]> writes:
> On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> Andrew Murray <[email protected]> writes:
>>
>> > Update design.txt to reflect the presence of the exclude_host
>> > and exclude_guest perf flags.
>> >
>> > Signed-off-by: Andrew Murray <[email protected]>
>> > ---
>> > tools/perf/design.txt | 4 ++++
>> > 1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> > index a28dca2..7de7d83 100644
>> > --- a/tools/perf/design.txt
>> > +++ b/tools/perf/design.txt
>> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
>> > way to request that counting of events be restricted to times when the
>> > CPU is in user, kernel and/or hypervisor mode.
>> >
>> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> > +to request counting of events restricted to guest and host contexts when
>> > +using virtualisation.
>>
>> How does exclude_host differ from exclude_hv ?
>
> I believe exclude_host / exclude_guest are intented to distinguish
> between host and guest in the hosted hypervisor context (KVM).
OK yeah, from the perf-list man page:
u - user-space counting
k - kernel counting
h - hypervisor counting
I - non idle counting
G - guest counting (in KVM guests)
H - host counting (not in KVM guests)
> Whereas exclude_hv allows to distinguish between guest and
> hypervisor in the bare-metal type hypervisors.
Except that's exactly not how we use them on powerpc :)
We use exclude_hv to exclude "the hypervisor", regardless of whether
it's KVM or PowerVM (which is a bare-metal hypervisor).
We don't use exclude_host / exclude_guest at all, which I guess is a
bug, except I didn't know they existed until this thread.
eg, in a KVM guest:
$ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
$ perf report -D | grep -Fc "dso: [hypervisor]"
16
> In the case of arm64 - if VHE extensions are present then the host
> kernel will run at a higher privilege to the guest kernel, in which
> case there is no distinction between hypervisor and host so we ignore
> exclude_hv. But where VHE extensions are not present then the host
> kernel runs at the same privilege level as the guest and we use a
> higher privilege level to switch between them - in this case we can
> use exclude_hv to discount that hypervisor role of switching between
> guests.
I couldn't find any arm64 perf code using exclude_host/guest at all?
And I don't see any x86 code using exclude_hv.
But maybe that's OK, I just worry this is confusing for users.
cheers
On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> [ Reviving old thread. ]
>
> Andrew Murray <[email protected]> writes:
> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> >> Andrew Murray <[email protected]> writes:
> >>
> >> > Update design.txt to reflect the presence of the exclude_host
> >> > and exclude_guest perf flags.
> >> >
> >> > Signed-off-by: Andrew Murray <[email protected]>
> >> > ---
> >> > tools/perf/design.txt | 4 ++++
> >> > 1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> >> > index a28dca2..7de7d83 100644
> >> > --- a/tools/perf/design.txt
> >> > +++ b/tools/perf/design.txt
> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> >> > way to request that counting of events be restricted to times when the
> >> > CPU is in user, kernel and/or hypervisor mode.
> >> >
> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> >> > +to request counting of events restricted to guest and host contexts when
> >> > +using virtualisation.
> >>
> >> How does exclude_host differ from exclude_hv ?
> >
> > I believe exclude_host / exclude_guest are intented to distinguish
> > between host and guest in the hosted hypervisor context (KVM).
>
> OK yeah, from the perf-list man page:
>
> u - user-space counting
> k - kernel counting
> h - hypervisor counting
> I - non idle counting
> G - guest counting (in KVM guests)
> H - host counting (not in KVM guests)
>
> > Whereas exclude_hv allows to distinguish between guest and
> > hypervisor in the bare-metal type hypervisors.
>
> Except that's exactly not how we use them on powerpc :)
>
> We use exclude_hv to exclude "the hypervisor", regardless of whether
> it's KVM or PowerVM (which is a bare-metal hypervisor).
>
> We don't use exclude_host / exclude_guest at all, which I guess is a
> bug, except I didn't know they existed until this thread.
>
> eg, in a KVM guest:
>
> $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> $ perf report -D | grep -Fc "dso: [hypervisor]"
> 16
>
>
> > In the case of arm64 - if VHE extensions are present then the host
> > kernel will run at a higher privilege to the guest kernel, in which
> > case there is no distinction between hypervisor and host so we ignore
> > exclude_hv. But where VHE extensions are not present then the host
> > kernel runs at the same privilege level as the guest and we use a
> > higher privilege level to switch between them - in this case we can
> > use exclude_hv to discount that hypervisor role of switching between
> > guests.
>
> I couldn't find any arm64 perf code using exclude_host/guest at all?
Correct - but this is in flight as I am currently adding support for this
see [1].
>
> And I don't see any x86 code using exclude_hv.
I can't find any either.
>
> But maybe that's OK, I just worry this is confusing for users.
There is some extra context regarding this where exclude_guest/exclude_host
was added, see [2] and where exclude_hv was added, see [3]
Generally it seems that exclude_guest/exclude_host relies upon switching
counters off/on on guest/host switch code (which works well in the nested
virt case). Whereas exclude_hv tends to rely solely on hardware capability
based on privilege level (which works well in the bare metal case where
the guest doesn't run at same privilege as the host).
I think from the user perspective exclude_hv allows you to see your overhead
if you are a guest (i.e. work done by bare metal hypervisor associated with
you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
see events above you (i.e. the kernel hypervisor) if you are the guest...
At least that's how I read this, I've copied in others that may provide
more authoritative feedback.
[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
[2] https://www.spinics.net/lists/kvm/msg53996.html
[3] https://lore.kernel.org/patchwork/patch/143918/
Thanks,
Andrew Murray
>
> cheers
Andrew Murray <[email protected]> writes:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
>> [ Reviving old thread. ]
>>
>> Andrew Murray <[email protected]> writes:
>> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> >> Andrew Murray <[email protected]> writes:
>> >>
>> >> > Update design.txt to reflect the presence of the exclude_host
>> >> > and exclude_guest perf flags.
>> >> >
>> >> > Signed-off-by: Andrew Murray <[email protected]>
>> >> > ---
>> >> > tools/perf/design.txt | 4 ++++
>> >> > 1 file changed, 4 insertions(+)
>> >> >
>> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> >> > index a28dca2..7de7d83 100644
>> >> > --- a/tools/perf/design.txt
>> >> > +++ b/tools/perf/design.txt
>> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
>> >> > way to request that counting of events be restricted to times when the
>> >> > CPU is in user, kernel and/or hypervisor mode.
>> >> >
>> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> >> > +to request counting of events restricted to guest and host contexts when
>> >> > +using virtualisation.
>> >>
>> >> How does exclude_host differ from exclude_hv ?
>> >
>> > I believe exclude_host / exclude_guest are intented to distinguish
>> > between host and guest in the hosted hypervisor context (KVM).
>>
>> OK yeah, from the perf-list man page:
>>
>> u - user-space counting
>> k - kernel counting
>> h - hypervisor counting
>> I - non idle counting
>> G - guest counting (in KVM guests)
>> H - host counting (not in KVM guests)
>>
>> > Whereas exclude_hv allows to distinguish between guest and
>> > hypervisor in the bare-metal type hypervisors.
>>
>> Except that's exactly not how we use them on powerpc :)
>>
>> We use exclude_hv to exclude "the hypervisor", regardless of whether
>> it's KVM or PowerVM (which is a bare-metal hypervisor).
>>
>> We don't use exclude_host / exclude_guest at all, which I guess is a
>> bug, except I didn't know they existed until this thread.
>>
>> eg, in a KVM guest:
>>
>> $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
>> $ perf report -D | grep -Fc "dso: [hypervisor]"
>> 16
>>
>>
>> > In the case of arm64 - if VHE extensions are present then the host
>> > kernel will run at a higher privilege to the guest kernel, in which
>> > case there is no distinction between hypervisor and host so we ignore
>> > exclude_hv. But where VHE extensions are not present then the host
>> > kernel runs at the same privilege level as the guest and we use a
>> > higher privilege level to switch between them - in this case we can
>> > use exclude_hv to discount that hypervisor role of switching between
>> > guests.
>>
>> I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].
OK, so at least that will be consistent across arm64 & x86.
>> And I don't see any x86 code using exclude_hv.
>
> I can't find any either.
I think that's because they don't need it, because they don't let guests
program the PMU directly. It's all handled by the host and the host
doesn't let the guest count host cycles anyway. But I could be wrong I'm
no x86 expert.
>> But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2]
Good find. I had looked at that commit, but the thread on the list is
more informative.
In fact there was even a man page update! Never occurred to me look
there :P
http://man7.org/linux/man-pages/man2/perf_event_open.2.html
exclude_host (since Linux 3.2)
When conducting measurements that include processes running VM
instances (i.e., have executed a KVM_RUN ioctl(2)), only mea‐
sure events happening inside a guest instance. This is only
meaningful outside the guests; this setting does not change
counts gathered inside of a guest. Currently, this function‐
ality is x86 only.
exclude_guest (since Linux 3.2)
When conducting measurements that include processes running VM
instances (i.e., have executed a KVM_RUN ioctl(2)), do not
measure events happening inside guest instances. This is only
meaningful outside the guests; this setting does not change
counts gathered inside of a guest. Currently, this function‐
ality is x86 only.
Which makes things much clearer.
Perhaps you want to add a reference to the man page in your text,
something like?
Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
to request counting of events restricted to guest and host contexts when
using virtualisation. See the perf_event_open(2) man page for more
detail.
cheers
On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > [ Reviving old thread. ]
> >
> > Andrew Murray <[email protected]> writes:
> > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > >> Andrew Murray <[email protected]> writes:
> > >>
> > >> > Update design.txt to reflect the presence of the exclude_host
> > >> > and exclude_guest perf flags.
> > >> >
> > >> > Signed-off-by: Andrew Murray <[email protected]>
> > >> > ---
> > >> > tools/perf/design.txt | 4 ++++
> > >> > 1 file changed, 4 insertions(+)
> > >> >
> > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > >> > index a28dca2..7de7d83 100644
> > >> > --- a/tools/perf/design.txt
> > >> > +++ b/tools/perf/design.txt
> > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > >> > way to request that counting of events be restricted to times when the
> > >> > CPU is in user, kernel and/or hypervisor mode.
> > >> >
> > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > >> > +to request counting of events restricted to guest and host contexts when
> > >> > +using virtualisation.
> > >>
> > >> How does exclude_host differ from exclude_hv ?
> > >
> > > I believe exclude_host / exclude_guest are intented to distinguish
> > > between host and guest in the hosted hypervisor context (KVM).
> >
> > OK yeah, from the perf-list man page:
> >
> > u - user-space counting
> > k - kernel counting
> > h - hypervisor counting
> > I - non idle counting
> > G - guest counting (in KVM guests)
> > H - host counting (not in KVM guests)
> >
> > > Whereas exclude_hv allows to distinguish between guest and
> > > hypervisor in the bare-metal type hypervisors.
> >
> > Except that's exactly not how we use them on powerpc :)
> >
> > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > it's KVM or PowerVM (which is a bare-metal hypervisor).
> >
> > We don't use exclude_host / exclude_guest at all, which I guess is a
> > bug, except I didn't know they existed until this thread.
> >
> > eg, in a KVM guest:
> >
> > $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> > $ perf report -D | grep -Fc "dso: [hypervisor]"
> > 16
> >
> >
> > > In the case of arm64 - if VHE extensions are present then the host
> > > kernel will run at a higher privilege to the guest kernel, in which
> > > case there is no distinction between hypervisor and host so we ignore
> > > exclude_hv. But where VHE extensions are not present then the host
> > > kernel runs at the same privilege level as the guest and we use a
> > > higher privilege level to switch between them - in this case we can
> > > use exclude_hv to discount that hypervisor role of switching between
> > > guests.
> >
> > I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].
>
> >
> > And I don't see any x86 code using exclude_hv.
>
> I can't find any either.
>
> >
> > But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2] and where exclude_hv was added, see [3]
>
> Generally it seems that exclude_guest/exclude_host relies upon switching
> counters off/on on guest/host switch code (which works well in the nested
> virt case). Whereas exclude_hv tends to rely solely on hardware capability
> based on privilege level (which works well in the bare metal case where
> the guest doesn't run at same privilege as the host).
>
> I think from the user perspective exclude_hv allows you to see your overhead
> if you are a guest (i.e. work done by bare metal hypervisor associated with
> you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
> see events above you (i.e. the kernel hypervisor) if you are the guest...
>
> At least that's how I read this, I've copied in others that may provide
> more authoritative feedback.
>
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
> [2] https://www.spinics.net/lists/kvm/msg53996.html
> [3] https://lore.kernel.org/patchwork/patch/143918/
>
I'll try to answer this in a different way, based on previous
discussions with Joerg et al. who introduced these flags. Assume no
support for nested virtualization as a first approximation:
If you are running as a guest:
- exclude_hv: stop counting events when the hypervisor runs
- exclude_host: has no effect
- exclude_guest: has no effect
If you are running as a host/hypervisor:
- exclude_hv: has no effect
- exclude_host: only count events when the guest is running
- exclude_guest: only count events when the host is running
With nested virtualization, you get the natural union of the above.
**This has nothing to do with the design of the hypervisor such as the
ARM non-VHE KVM which splits its execution across EL1 and EL2 -- those
are both considered host from the point of view of Linux as a hypervisor
using KVM, and both considered hypervisor from the point of view of a
guest.**
Thanks,
Christoffer
On Wed, Dec 12, 2018 at 09:07:42AM +0100, Christoffer Dall wrote:
> On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> > On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > > [ Reviving old thread. ]
> > >
> > > Andrew Murray <[email protected]> writes:
> > > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > > >> Andrew Murray <[email protected]> writes:
> > > >>
> > > >> > Update design.txt to reflect the presence of the exclude_host
> > > >> > and exclude_guest perf flags.
> > > >> >
> > > >> > Signed-off-by: Andrew Murray <[email protected]>
> > > >> > ---
> > > >> > tools/perf/design.txt | 4 ++++
> > > >> > 1 file changed, 4 insertions(+)
> > > >> >
> > > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > > >> > index a28dca2..7de7d83 100644
> > > >> > --- a/tools/perf/design.txt
> > > >> > +++ b/tools/perf/design.txt
> > > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > > >> > way to request that counting of events be restricted to times when the
> > > >> > CPU is in user, kernel and/or hypervisor mode.
> > > >> >
> > > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > > >> > +to request counting of events restricted to guest and host contexts when
> > > >> > +using virtualisation.
> > > >>
> > > >> How does exclude_host differ from exclude_hv ?
> > > >
> > > > I believe exclude_host / exclude_guest are intented to distinguish
> > > > between host and guest in the hosted hypervisor context (KVM).
> > >
> > > OK yeah, from the perf-list man page:
> > >
> > > u - user-space counting
> > > k - kernel counting
> > > h - hypervisor counting
> > > I - non idle counting
> > > G - guest counting (in KVM guests)
> > > H - host counting (not in KVM guests)
> > >
> > > > Whereas exclude_hv allows to distinguish between guest and
> > > > hypervisor in the bare-metal type hypervisors.
> > >
> > > Except that's exactly not how we use them on powerpc :)
> > >
> > > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > > it's KVM or PowerVM (which is a bare-metal hypervisor).
> > >
> > > We don't use exclude_host / exclude_guest at all, which I guess is a
> > > bug, except I didn't know they existed until this thread.
> > >
> > > eg, in a KVM guest:
> > >
> > > $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> > > $ perf report -D | grep -Fc "dso: [hypervisor]"
> > > 16
> > >
> > >
> > > > In the case of arm64 - if VHE extensions are present then the host
> > > > kernel will run at a higher privilege to the guest kernel, in which
> > > > case there is no distinction between hypervisor and host so we ignore
> > > > exclude_hv. But where VHE extensions are not present then the host
> > > > kernel runs at the same privilege level as the guest and we use a
> > > > higher privilege level to switch between them - in this case we can
> > > > use exclude_hv to discount that hypervisor role of switching between
> > > > guests.
> > >
> > > I couldn't find any arm64 perf code using exclude_host/guest at all?
> >
> > Correct - but this is in flight as I am currently adding support for this
> > see [1].
> >
> > >
> > > And I don't see any x86 code using exclude_hv.
> >
> > I can't find any either.
> >
> > >
> > > But maybe that's OK, I just worry this is confusing for users.
> >
> > There is some extra context regarding this where exclude_guest/exclude_host
> > was added, see [2] and where exclude_hv was added, see [3]
> >
> > Generally it seems that exclude_guest/exclude_host relies upon switching
> > counters off/on on guest/host switch code (which works well in the nested
> > virt case). Whereas exclude_hv tends to rely solely on hardware capability
> > based on privilege level (which works well in the bare metal case where
> > the guest doesn't run at same privilege as the host).
> >
> > I think from the user perspective exclude_hv allows you to see your overhead
> > if you are a guest (i.e. work done by bare metal hypervisor associated with
> > you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
> > see events above you (i.e. the kernel hypervisor) if you are the guest...
> >
> > At least that's how I read this, I've copied in others that may provide
> > more authoritative feedback.
> >
> > [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
> > [2] https://www.spinics.net/lists/kvm/msg53996.html
> > [3] https://lore.kernel.org/patchwork/patch/143918/
> >
>
> I'll try to answer this in a different way, based on previous
> discussions with Joerg et al. who introduced these flags. Assume no
> support for nested virtualization as a first approximation:
>
> If you are running as a guest:
> - exclude_hv: stop counting events when the hypervisor runs
> - exclude_host: has no effect
> - exclude_guest: has no effect
>
> If you are running as a host/hypervisor:
> - exclude_hv: has no effect
> - exclude_host: only count events when the guest is running
> - exclude_guest: only count events when the host is running
>
> With nested virtualization, you get the natural union of the above.
>
> **This has nothing to do with the design of the hypervisor such as the
> ARM non-VHE KVM which splits its execution across EL1 and EL2 -- those
> are both considered host from the point of view of Linux as a hypervisor
> using KVM, and both considered hypervisor from the point of view of a
> guest.**
For clarity, this is what arm64 currently does (assuming no nesting and
without the current version of this patchset):
If you are running as a guest (VHE or !VHE host):
- exclude_hv: has no effect for a KVM guest (filters hypervisor on !VHE
bare metal hypervisor guest)
- exclude_host: has no effect
- exclude_guest: has no effect
If you are running as a host/hypervisor:
- exclude_hv: has no effect for VHE (filters EL2 on !VHE)
- exclude_host: only count events when the guest is running
- exclude_guest: only count events when the host is running
Is this as expected?
Thanks,
Andrew Murray
>
>
> Thanks,
>
> Christoffer