Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.
However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. Furthermore the
drivers are not consistent with regards to the error value they return
when reporting unsupported attributes.
This patchset allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where there
is no support in the PMU.
This is a functional change, in particular:
- Some drivers will now additionally (but correctly) report unsupported
exclusion flags. It's typical for existing userspace tools such as
perf to handle such errors by retrying the system call without the
unsupported flags.
- Drivers that do not support any exclusion that previously reported
-EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
with the majority and results in userspace perf retrying without
exclusion.
All drivers touched by this patchset have been compile tested.
Changes from v3:
- Added PERF_PMU_CAP_NO_EXCLUDE to Cavium TX2 PMU driver
Changes from v2:
- Invert logic from CAP_EXCLUDE to CAP_NO_EXCLUDE
Changes from v1:
- Changed approach from explicitly rejecting events in unsupporting PMU
drivers to explicitly advertising a capability in PMU drivers that
do support exclusion events
- Added additional information to tools/perf/design.txt
- Rename event_has_exclude_flags to event_has_any_exclude_flag and
update commit log to reflect it's a function
Andrew Murray (13):
perf/doc: update design.txt for exclude_{host|guest} flags
perf/core: add function to test for event exclusion flags
perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE
arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable
PMUs
x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
perf/core: remove unused perf_flags
drivers/perf: use PERF_PMU_CAP_NO_EXCLUDE for Cavium TX2 PMU
arch/alpha/kernel/perf_event.c | 7 +------
arch/arm/mach-imx/mmdc.c | 9 ++-------
arch/arm/mm/cache-l2x0-pmu.c | 9 +--------
arch/powerpc/perf/hv-24x7.c | 10 +---------
arch/powerpc/perf/hv-gpci.c | 10 +---------
arch/powerpc/perf/imc-pmu.c | 19 +------------------
arch/x86/events/amd/ibs.c | 13 +------------
arch/x86/events/amd/iommu.c | 6 +-----
arch/x86/events/amd/power.c | 10 ++--------
arch/x86/events/amd/uncore.c | 7 ++-----
arch/x86/events/intel/cstate.c | 12 +++---------
arch/x86/events/intel/rapl.c | 9 ++-------
arch/x86/events/intel/uncore.c | 9 +--------
arch/x86/events/intel/uncore_snb.c | 9 ++-------
arch/x86/events/msr.c | 10 ++--------
drivers/perf/arm-cci.c | 10 +---------
drivers/perf/arm-ccn.c | 6 ++----
drivers/perf/arm_dsu_pmu.c | 9 ++-------
drivers/perf/arm_pmu.c | 15 +++++----------
drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 ---------
drivers/perf/qcom_l2_pmu.c | 9 +--------
drivers/perf/qcom_l3_pmu.c | 8 +-------
drivers/perf/thunderx2_pmu.c | 10 +---------
drivers/perf/xgene_pmu.c | 6 +-----
include/linux/perf_event.h | 10 ++++++++++
include/uapi/linux/perf_event.h | 2 --
kernel/events/core.c | 9 +++++++++
tools/include/uapi/linux/perf_event.h | 2 --
tools/perf/design.txt | 4 ++++
32 files changed, 63 insertions(+), 198 deletions(-)
--
2.7.4
As the Alpha PMU doesn't support context exclusion let's advertise
the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
This change means that __hw_perf_event_init will now also
indicate that it doesn't support exclude_host and exclude_guest and
will now implicitly return -EINVAL instead of -EPERM. This is likely
more desirable as -EPERM will result in a kernel.perf_event_paranoid
related warning from the perf userspace utility.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/alpha/kernel/perf_event.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..4341ccf 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event)
return ev;
}
- /* The EV67 does not support mode exclusion */
- if (attr->exclude_kernel || attr->exclude_user
- || attr->exclude_hv || attr->exclude_idle) {
- return -EPERM;
- }
-
/*
* We place the event type in event_base here and leave calculation
* of the codes to programme the PMU for alpha_pmu_enable() because
@@ -771,6 +765,7 @@ static struct pmu pmu = {
.start = alpha_pmu_start,
.stop = alpha_pmu_stop,
.read = alpha_pmu_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
--
2.7.4
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.
Let's instead allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.
Signed-off-by: Andrew Murray <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 9 +++++++++
2 files changed, 10 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 54a78d2..cec02dc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
#define PERF_PMU_CAP_EXCLUSIVE 0x10
#define PERF_PMU_CAP_ITRACE 0x20
#define PERF_PMU_CAP_HETEROGENEOUS_CPUS 0x40
+#define PERF_PMU_CAP_NO_EXCLUDE 0x80
/**
* struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3cd13a3..fbe59b7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9772,6 +9772,15 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
+ if (!ret) {
+ if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE &&
+ event_has_any_exclude_flag(event)) {
+ if (event->destroy)
+ event->destroy(event);
+ ret = -EINVAL;
+ }
+ }
+
if (ret)
module_put(pmu->module);
--
2.7.4
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
Signed-off-by: Andrew Murray <[email protected]>
Acked-by: Shawn Guo <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
arch/arm/mach-imx/mmdc.c | 9 ++-------
arch/arm/mm/cache-l2x0-pmu.c | 9 +--------
2 files changed, 3 insertions(+), 15 deletions(-)
diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index e49e068..fce4b42 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -294,13 +294,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period)
+ if (event->attr.sample_period)
return -EINVAL;
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
@@ -456,6 +450,7 @@ static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
.start = mmdc_pmu_event_start,
.stop = mmdc_pmu_event_stop,
.read = mmdc_pmu_event_update,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.mmdc_base = mmdc_base,
.dev = dev,
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..99bcd07 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -544,6 +536,7 @@ static __init int l2x0_pmu_init(void)
.del = l2x0_pmu_event_del,
.event_init = l2x0_pmu_event_init,
.attr_groups = l2x0_pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
l2x0_pmu_reset();
--
2.7.4
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.
Signed-off-by: Andrew Murray <[email protected]>
---
tools/perf/design.txt | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..0453ba2 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
way to request that counting of events be restricted to times when the
CPU is in user, kernel and/or hypervisor mode.
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using Linux as the hypervisor.
+
The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
operations, these can be used to relate userspace IP addresses to actual
code, even after the mapping (or even the whole process) is gone,
--
2.7.4
The ARM PMU driver can be used to represent a variety of ARM based
PMUs. Some of these PMUs do not provide support for context
exclusion, where this is the case we advertise the
PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us
from handling events where any exclusion flags are set.
Signed-off-by: Andrew Murray <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
drivers/perf/arm_pmu.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index d0b7dd8..eec75b9 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
}
static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
- return attr->exclude_idle || attr->exclude_user ||
- attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
__hw_perf_event_init(struct perf_event *event)
{
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
/*
* Check whether we need to exclude the counter from certain modes.
*/
- if ((!armpmu->set_event_filter ||
- armpmu->set_event_filter(hwc, &event->attr)) &&
- event_requires_mode_exclusion(&event->attr)) {
+ if (armpmu->set_event_filter &&
+ armpmu->set_event_filter(hwc, &event->attr)) {
pr_debug("ARM performance counters do not support "
"mode exclusion\n");
return -EOPNOTSUPP;
@@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
+ if (!pmu->set_event_filter)
+ pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
+
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
if (ret)
goto out_destroy;
--
2.7.4
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/x86/events/amd/ibs.c | 13 +------------
arch/x86/events/amd/power.c | 10 ++--------
arch/x86/events/intel/cstate.c | 12 +++---------
arch/x86/events/intel/rapl.c | 9 ++-------
arch/x86/events/intel/uncore_snb.c | 9 ++-------
arch/x86/events/msr.c | 10 ++--------
6 files changed, 12 insertions(+), 51 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..62f317c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
return -EOPNOTSUPP;
}
-static const struct perf_event_attr ibs_notsupp = {
- .exclude_user = 1,
- .exclude_kernel = 1,
- .exclude_hv = 1,
- .exclude_idle = 1,
- .exclude_host = 1,
- .exclude_guest = 1,
-};
-
static int perf_ibs_init(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != &perf_ibs->pmu)
return -ENOENT;
- if (perf_flags(&event->attr) & perf_flags(&ibs_notsupp))
- return -EINVAL;
-
if (config & ~perf_ibs->config_mask)
return -EINVAL;
@@ -537,6 +525,7 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..c5ff084 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
/* Unsupported modes and filters. */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- /* no sampling */
- event->attr.sample_period)
+ if (event->attr.sample_period)
return -EINVAL;
if (cfg != AMD_POWER_EVENTSEL_PKG)
@@ -226,6 +219,7 @@ static struct pmu pmu_class = {
.start = pmu_event_start,
.stop = pmu_event_stop,
.read = pmu_event_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static int power_cpu_exit(unsigned int cpu)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index d2e7807..94a4b7f 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period) /* no sampling */
+ if (event->attr.sample_period) /* no sampling */
return -EINVAL;
if (event->cpu < 0)
@@ -437,7 +431,7 @@ static struct pmu cstate_core_pmu = {
.start = cstate_pmu_event_start,
.stop = cstate_pmu_event_stop,
.read = cstate_pmu_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
};
@@ -451,7 +445,7 @@ static struct pmu cstate_pkg_pmu = {
.start = cstate_pmu_event_start,
.stop = cstate_pmu_event_stop,
.read = cstate_pmu_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
};
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 91039ff..94dc564 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period) /* no sampling */
+ if (event->attr.sample_period) /* no sampling */
return -EINVAL;
/* must be done before validate_group */
@@ -699,6 +693,7 @@ static int __init init_rapl_pmus(void)
rapl_pmus->pmu.stop = rapl_pmu_event_stop;
rapl_pmus->pmu.read = rapl_pmu_event_read;
rapl_pmus->pmu.module = THIS_MODULE;
+ rapl_pmus->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
return 0;
}
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 2593b0d..b12517f 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -397,13 +397,7 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
return -EINVAL;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period) /* no sampling */
+ if (event->attr.sample_period) /* no sampling */
return -EINVAL;
/*
@@ -497,6 +491,7 @@ static struct pmu snb_uncore_imc_pmu = {
.start = uncore_pmu_event_start,
.stop = uncore_pmu_event_stop,
.read = uncore_pmu_event_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static struct intel_uncore_ops snb_uncore_imc_ops = {
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 1b9f85a..a878e62 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -160,13 +160,7 @@ static int msr_event_init(struct perf_event *event)
return -ENOENT;
/* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.sample_period) /* no sampling */
+ if (event->attr.sample_period) /* no sampling */
return -EINVAL;
if (cfg >= PERF_MSR_EVENT_MAX)
@@ -256,7 +250,7 @@ static struct pmu pmu_msr = {
.start = msr_event_start,
.stop = msr_event_stop,
.read = msr_event_update,
- .capabilities = PERF_PMU_CAP_NO_INTERRUPT,
+ .capabilities = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
};
static int __init msr_init(void)
--
2.7.4
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
Signed-off-by: Andrew Murray <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
drivers/perf/arm-cci.c | 10 +---------
drivers/perf/arm-ccn.c | 6 ++----
drivers/perf/arm_dsu_pmu.c | 9 ++-------
drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_pmu.c | 9 ---------
7 files changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..bfd03e0 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
- /* We have no filtering of any kind */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
/*
* Following the example set by other "uncore" PMUs, we accept any CPU
* and rewrite its affinity dynamically rather than having perf core
@@ -1433,6 +1424,7 @@ static int cci_pmu_init(struct cci_pmu *cci_pmu, struct platform_device *pdev)
.stop = cci_pmu_stop,
.read = pmu_read,
.attr_groups = pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
cci_pmu->plat_device = pdev;
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..2ae7602 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- if (has_branch_stack(event) || event->attr.exclude_user ||
- event->attr.exclude_kernel || event->attr.exclude_hv ||
- event->attr.exclude_idle || event->attr.exclude_host ||
- event->attr.exclude_guest) {
+ if (has_branch_stack(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
@@ -1290,6 +1287,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
.read = arm_ccn_pmu_event_read,
.pmu_enable = arm_ccn_pmu_enable,
.pmu_disable = arm_ccn_pmu_disable,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
/* No overflow interrupt? Have to use a timer instead. */
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..5851de5 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
- if (has_branch_stack(event) ||
- event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest) {
+ if (has_branch_stack(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
@@ -735,6 +729,7 @@ static int dsu_pmu_device_probe(struct platform_device *pdev)
.read = dsu_pmu_read,
.attr_groups = dsu_pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 69372e2..0eba947 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -396,6 +396,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
.stop = hisi_uncore_pmu_stop,
.read = hisi_uncore_pmu_read,
.attr_groups = hisi_ddrc_pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
ret = perf_pmu_register(&ddrc_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index 443906e..2553a84 100644
--- a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
@@ -407,6 +407,7 @@ static int hisi_hha_pmu_probe(struct platform_device *pdev)
.stop = hisi_uncore_pmu_stop,
.read = hisi_uncore_pmu_read,
.attr_groups = hisi_hha_pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
ret = perf_pmu_register(&hha_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
index 0bde5d9..cf1cc34 100644
--- a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
@@ -397,6 +397,7 @@ static int hisi_l3c_pmu_probe(struct platform_device *pdev)
.stop = hisi_uncore_pmu_stop,
.read = hisi_uncore_pmu_read,
.attr_groups = hisi_l3c_pmu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
ret = perf_pmu_register(&l3c_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 9efd241..f028cbc 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -142,15 +142,6 @@ int hisi_uncore_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
- /* counters do not have these bits */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_host ||
- event->attr.exclude_guest ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle)
- return -EINVAL;
-
/*
* The uncore counters not specific to any CPU, so cannot
* support per-task
--
2.7.4
For x86 PMUs that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
This change means that amd/iommu and amd/uncore will now also
indicate that they do not support exclude_{hv|idle} and intel/uncore
that it does not support exclude_{guest|host}.
Signed-off-by: Andrew Murray <[email protected]>
---
arch/x86/events/amd/iommu.c | 6 +-----
arch/x86/events/amd/uncore.c | 7 ++-----
arch/x86/events/intel/uncore.c | 9 +--------
3 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..7635c23 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- /* IOMMU counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -414,6 +409,7 @@ static const struct pmu iommu_pmu __initconst = {
.read = perf_iommu_read,
.task_ctx_nr = perf_invalid_context,
.attr_groups = amd_iommu_attr_groups,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static __init int init_one_iommu(unsigned int idx)
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 398df6e..79cfd3b 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- /* NB and Last level cache counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
- return -EINVAL;
-
/* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;
@@ -307,6 +302,7 @@ static struct pmu amd_nb_pmu = {
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static struct pmu amd_llc_pmu = {
@@ -317,6 +313,7 @@ static struct pmu amd_llc_pmu = {
.start = amd_uncore_start,
.stop = amd_uncore_stop,
.read = amd_uncore_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..d516161 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event)
if (pmu->func_id < 0)
return -ENOENT;
- /*
- * Uncore PMU does measure at all privilege level all the time.
- * So it doesn't make sense to specify any exclude bits.
- */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle)
- return -EINVAL;
-
/* Sampling not supported yet */
if (hwc->sample_period)
return -EINVAL;
@@ -800,6 +792,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
.stop = uncore_pmu_event_stop,
.read = uncore_pmu_event_read,
.module = THIS_MODULE,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
} else {
pmu->pmu = *pmu->type->pmu;
--
2.7.4
Now that perf_flags is not used we remove it.
Signed-off-by: Andrew Murray <[email protected]>
---
include/uapi/linux/perf_event.h | 2 --
tools/include/uapi/linux/perf_event.h | 2 --
2 files changed, 4 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9de8780..ea19b5d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32 ids[0];
};
-#define perf_flags(attr) (*(&(attr)->read_format + 1))
-
/*
* Ioctls that can be done on a perf event fd:
*/
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 9de8780..ea19b5d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32 ids[0];
};
-#define perf_flags(attr) (*(&(attr)->read_format + 1))
-
/*
* Ioctls that can be done on a perf event fd:
*/
--
2.7.4
The Cavium ThunderX2 UNCORE PMU driver doesn't support any event
filtering. Let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability to
simplify the code.
Signed-off-by: Andrew Murray <[email protected]>
---
drivers/perf/thunderx2_pmu.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index c9a1701..43d76c8 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -424,15 +424,6 @@ static int tx2_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- /* We have no filtering of any kind */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -572,6 +563,7 @@ static int tx2_uncore_pmu_register(
.start = tx2_uncore_event_start,
.stop = tx2_uncore_event_stop,
.read = tx2_uncore_event_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
tx2_pmu->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
--
2.7.4
Add a function that tests if any of the perf event exclusion flags
are set on a given event.
Signed-off-by: Andrew Murray <[email protected]>
---
include/linux/perf_event.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d5c551..54a78d2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
extern void
perf_log_lost_samples(struct perf_event *event, u64 lost);
+static inline bool event_has_any_exclude_flag(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+
+ return attr->exclude_idle || attr->exclude_user ||
+ attr->exclude_kernel || attr->exclude_hv ||
+ attr->exclude_guest || attr->exclude_host;
+}
+
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
--
2.7.4
For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.
This change means that qcom_{l2|l3}_pmu will now also indicate that
they do not support exclude_{host|guest} and that xgene_pmu does
not also support exclude_idle and exclude_hv.
Note that for qcom_l2_pmu we now implictly return -EINVAL instead
of -EOPNOTSUPP. This change will result in the perf userspace
utility retrying the perf_event_open system call with fallback
event attributes that do not fail.
Signed-off-by: Andrew Murray <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
drivers/perf/qcom_l2_pmu.c | 9 +--------
drivers/perf/qcom_l3_pmu.c | 8 +-------
drivers/perf/xgene_pmu.c | 6 +-----
3 files changed, 3 insertions(+), 20 deletions(-)
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..091b4d7 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
- /* We cannot filter accurately so we just don't allow it. */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle) {
- dev_dbg_ratelimited(&l2cache_pmu->pdev->dev,
- "Can't exclude execution levels\n");
- return -EOPNOTSUPP;
- }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
@@ -982,6 +974,7 @@ static int l2_cache_pmu_probe(struct platform_device *pdev)
.stop = l2_cache_event_stop,
.read = l2_cache_event_read,
.attr_groups = l2_cache_pmu_attr_grps,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
l2cache_pmu->num_counters = get_num_counters();
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..5d70646 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event *event)
return -ENOENT;
/*
- * There are no per-counter mode filters in the PMU.
- */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_hv || event->attr.exclude_idle)
- return -EINVAL;
-
- /*
* Sampling not supported since these events are not core-attributable.
*/
if (hwc->sample_period)
@@ -777,6 +770,7 @@ static int qcom_l3_cache_pmu_probe(struct platform_device *pdev)
.read = qcom_l3_cache__event_read,
.attr_groups = qcom_l3_cache_pmu_attr_grps,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
memrc = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0dc9ff0..d4ec048 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -917,11 +917,6 @@ static int xgene_perf_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
- /* SOC counters do not have usr/os/guest/host bits */
- if (event->attr.exclude_user || event->attr.exclude_kernel ||
- event->attr.exclude_host || event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
/*
@@ -1136,6 +1131,7 @@ static int xgene_init_perf(struct xgene_pmu_dev *pmu_dev, char *name)
.start = xgene_perf_start,
.stop = xgene_perf_stop,
.read = xgene_perf_read,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
/* Hardware counter init */
--
2.7.4
For PowerPC PMUs that do not support context exclusion let's
advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion flags
are set. Let's also remove the now unnecessary check for exclusion
flags.
Signed-off-by: Andrew Murray <[email protected]>
Reviewed-by: Madhavan Srinivasan <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
---
arch/powerpc/perf/hv-24x7.c | 10 +---------
arch/powerpc/perf/hv-gpci.c | 10 +---------
arch/powerpc/perf/imc-pmu.c | 19 +------------------
3 files changed, 3 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..d2b8e60 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
- /* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = {
.start_txn = h_24x7_event_start_txn,
.commit_txn = h_24x7_event_commit_txn,
.cancel_txn = h_24x7_event_cancel_txn,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static int hv_24x7_init(void)
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..735e77b 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
return -EINVAL;
}
- /* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = {
.start = h_gpci_event_start,
.stop = h_gpci_event_stop,
.read = h_gpci_event_update,
+ .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
static int hv_gpci_init(void)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f292a3f..b1c37cc 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
- /* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
- /* unsupported modes and filters */
- if (event->attr.exclude_user ||
- event->attr.exclude_kernel ||
- event->attr.exclude_hv ||
- event->attr.exclude_idle ||
- event->attr.exclude_host ||
- event->attr.exclude_guest)
- return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
@@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.stop = imc_event_stop;
pmu->pmu.read = imc_event_update;
pmu->pmu.attr_groups = pmu->attr_groups;
+ pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
switch (pmu->domain) {
--
2.7.4
On Mon, Jan 07, 2019 at 04:27:22PM +0000, Andrew Murray wrote:
> @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
> /*
> * Check whether we need to exclude the counter from certain modes.
> */
> + if (armpmu->set_event_filter &&
> + armpmu->set_event_filter(hwc, &event->attr)) {
> pr_debug("ARM performance counters do not support "
> "mode exclusion\n");
> return -EOPNOTSUPP;
This then requires all set_event_filter() implementations to check all
the various exclude options; also, set_event_filter() failing then
returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE
generates, which is again inconsitent.
If I look at (the very first git-grep found me)
armv7pmu_set_event_filter(), then I find it returning -EPERM (again
inconsistent but irrelevant because the actual value is not preserved)
for exclude_idle.
But it doesn't seem to check exclude_host at all for example.
> @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
> if (ret)
> return ret;
>
> + if (!pmu->set_event_filter)
> + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> +
> ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
> if (ret)
> goto out_destroy;
> --
> 2.7.4
>
On Mon, Jan 07, 2019 at 04:27:27PM +0000, Andrew Murray wrote:
> For drivers that do not support context exclusion let's advertise the
> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
> prevent us from handling events where any exclusion flags are set.
> Let's also remove the now unnecessary check for exclusion flags.
>
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> arch/x86/events/amd/ibs.c | 13 +------------
> arch/x86/events/amd/power.c | 10 ++--------
> arch/x86/events/intel/cstate.c | 12 +++---------
> arch/x86/events/intel/rapl.c | 9 ++-------
> arch/x86/events/intel/uncore_snb.c | 9 ++-------
> arch/x86/events/msr.c | 10 ++--------
> 6 files changed, 12 insertions(+), 51 deletions(-)
You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
then you also don't check if it handles all the various exclude options
correctly/consistently.
Now; I must admit that that is a bit of a maze, but I think we can at
least add exclude_idle and exclude_hv fails in there, nothing uses those
afaict.
On the various exclude options; they are as follows (IIUC):
- exclude_guest: we're a HV/host-kernel and we don't want the counter
to run when we run a guest context.
- exclude_host: we're a HV/host-kernel and we don't want the counter
to run when we run in host context.
- exclude_hv: we're a guest and don't want the counter to run in HV
context.
Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
On Mon, Jan 07, 2019 at 04:27:28PM +0000, Andrew Murray wrote:
This patch has the exact same subject as the previous one.. that seems
sub-optimal.
On Tue, Jan 08, 2019 at 11:28:02AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:22PM +0000, Andrew Murray wrote:
> > @@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
> > /*
> > * Check whether we need to exclude the counter from certain modes.
> > */
> > + if (armpmu->set_event_filter &&
> > + armpmu->set_event_filter(hwc, &event->attr)) {
> > pr_debug("ARM performance counters do not support "
> > "mode exclusion\n");
> > return -EOPNOTSUPP;
>
> This then requires all set_event_filter() implementations to check all
> the various exclude options;
Yes but this isn't a new requirement, this hunk uses the absence of
set_event_filter to blanket indicate that no exclusion flags are supported.
> also, set_event_filter() failing then
> returns with -EOPNOTSUPP instead of the -EINVAL the CAP_NO_EXCLUDE
> generates, which is again inconsitent.
Yes, it's not ideal - but a step in the right direction. I wanted to limit
user visible changes as much as possible, where I've identified them I've
noted it in the commit log.
>
> If I look at (the very first git-grep found me)
> armv7pmu_set_event_filter(), then I find it returning -EPERM (again
> inconsistent but irrelevant because the actual value is not preserved)
> for exclude_idle.
>
> But it doesn't seem to check exclude_host at all for example.
Yes I found lots of examples like this across the tree whilst doing this
work. However I decided to initially start with simply removing duplicated
code as a result of adding this flag and attempting to preserve existing
functionality. I thought that if I add missing checks then the patchset
will get much bigger and be harder to merge. I would like to do this though
as another non-cross-arch series.
Can we limit this patch series to the minimal changes required to fully
use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
in subsequent patch sets?
Thanks,
Andrew Murray
>
> > @@ -867,6 +859,9 @@ int armpmu_register(struct arm_pmu *pmu)
> > if (ret)
> > return ret;
> >
> > + if (!pmu->set_event_filter)
> > + pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> > +
> > ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
> > if (ret)
> > goto out_destroy;
> > --
> > 2.7.4
> >
On Tue, Jan 08, 2019 at 11:49:40AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:28PM +0000, Andrew Murray wrote:
>
> This patch has the exact same subject as the previous one.. that seems
> sub-optimal.
Ah yes, I'll update that in subsquent revisions. (The reason for two patches
was to separate functional vs non-functional changes).
Andrew Murray
On Tue, Jan 08, 2019 at 01:07:41PM +0000, Andrew Murray wrote:
> Yes I found lots of examples like this across the tree whilst doing this
> work. However I decided to initially start with simply removing duplicated
> code as a result of adding this flag and attempting to preserve existing
> functionality. I thought that if I add missing checks then the patchset
> will get much bigger and be harder to merge. I would like to do this though
> as another non-cross-arch series.
>
> Can we limit this patch series to the minimal changes required to fully
> use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> in subsequent patch sets?
Ok, but it would've been nice to see that mentioned somewhere.
On Tue, Jan 08, 2019 at 11:48:41AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:27PM +0000, Andrew Murray wrote:
> > For drivers that do not support context exclusion let's advertise the
> > PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
> > prevent us from handling events where any exclusion flags are set.
> > Let's also remove the now unnecessary check for exclusion flags.
> >
> > Signed-off-by: Andrew Murray <[email protected]>
> > ---
> > arch/x86/events/amd/ibs.c | 13 +------------
> > arch/x86/events/amd/power.c | 10 ++--------
> > arch/x86/events/intel/cstate.c | 12 +++---------
> > arch/x86/events/intel/rapl.c | 9 ++-------
> > arch/x86/events/intel/uncore_snb.c | 9 ++-------
> > arch/x86/events/msr.c | 10 ++--------
> > 6 files changed, 12 insertions(+), 51 deletions(-)
>
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
>
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.
Yes it took me some time to make sense of it.
As per my comments in the other patch, I think you're suggesting that I
add additional checks to x86. I think they are needed but I'd prefer to
make functional changes in a separate series, I'm happy to do this.
>
> On the various exclude options; they are as follows (IIUC):
>
> - exclude_guest: we're a HV/host-kernel and we don't want the counter
> to run when we run a guest context.
>
> - exclude_host: we're a HV/host-kernel and we don't want the counter
> to run when we run in host context.
>
> - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
>
> Now, KVM always implies exclude_hv afaict (for guests),
It certaintly does for ARM.
> I'm not sure
> what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
Thanks,
Andrew Murray
>
On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 01:07:41PM +0000, Andrew Murray wrote:
>
> > Yes I found lots of examples like this across the tree whilst doing this
> > work. However I decided to initially start with simply removing duplicated
> > code as a result of adding this flag and attempting to preserve existing
> > functionality. I thought that if I add missing checks then the patchset
> > will get much bigger and be harder to merge. I would like to do this though
> > as another non-cross-arch series.
> >
> > Can we limit this patch series to the minimal changes required to fully
> > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> > in subsequent patch sets?
>
> Ok, but it would've been nice to see that mentioned somewhere.
I'll update the cover leter on any next revision. I'll try to be clearer next
time with my intentions.
Andrew Murray
On Tue, Jan 08, 2019 at 01:13:57PM +0000, Andrew Murray wrote:
> On Tue, Jan 08, 2019 at 02:10:31PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 08, 2019 at 01:07:41PM +0000, Andrew Murray wrote:
> >
> > > Yes I found lots of examples like this across the tree whilst doing this
> > > work. However I decided to initially start with simply removing duplicated
> > > code as a result of adding this flag and attempting to preserve existing
> > > functionality. I thought that if I add missing checks then the patchset
> > > will get much bigger and be harder to merge. I would like to do this though
> > > as another non-cross-arch series.
> > >
> > > Can we limit this patch series to the minimal changes required to fully
> > > use PERF_PMU_CAP_NO_EXCLUDE and then attempt to fix these existing problems
> > > in subsequent patch sets?
> >
> > Ok, but it would've been nice to see that mentioned somewhere.
>
> I'll update the cover leter on any next revision. I'll try to be clearer next
> time with my intentions.
Could you maybe include it in the relevant patches too; like for example
the ARM one where we rely on set_event_filter() to DTRT.
So with the changelogs and subjects fixed I can take these patches and
then you can get on with cleaning up the individual drivers.
On 1/8/19 5:48 AM, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 04:27:27PM +0000, Andrew Murray wrote:
>> For drivers that do not support context exclusion let's advertise the
>> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
>> prevent us from handling events where any exclusion flags are set.
>> Let's also remove the now unnecessary check for exclusion flags.
>>
>> Signed-off-by: Andrew Murray <[email protected]>
>> ---
>> arch/x86/events/amd/ibs.c | 13 +------------
>> arch/x86/events/amd/power.c | 10 ++--------
>> arch/x86/events/intel/cstate.c | 12 +++---------
>> arch/x86/events/intel/rapl.c | 9 ++-------
>> arch/x86/events/intel/uncore_snb.c | 9 ++-------
>> arch/x86/events/msr.c | 10 ++--------
>> 6 files changed, 12 insertions(+), 51 deletions(-)
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
>
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.
>
> On the various exclude options; they are as follows (IIUC):
>
> - exclude_guest: we're a HV/host-kernel and we don't want the counter
> to run when we run a guest context.
>
> - exclude_host: we're a HV/host-kernel and we don't want the counter
> to run when we run in host context.
>
> - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
>
> Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
> what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
perf does work inside guests.
VPMU is managed by the Xen and it presents to the guest only samples
that are associated with the guest. So from that perspective exclude_hv
doesn't seem to be needed.
There is a VPMU mode that allows profiling whole system (host and
guests) from dom0, and this where exclude_hv might be useful. But this
mode, ahem, needs some work.
-boris
On Tue, Jan 08, 2019 at 11:36:33AM -0500, Boris Ostrovsky wrote:
> On 1/8/19 5:48 AM, Peter Zijlstra wrote:
> > On the various exclude options; they are as follows (IIUC):
> >
> > - exclude_guest: we're a HV/host-kernel and we don't want the counter
> > to run when we run a guest context.
> >
> > - exclude_host: we're a HV/host-kernel and we don't want the counter
> > to run when we run in host context.
> >
> > - exclude_hv: we're a guest and don't want the counter to run in HV
> > context.
> >
> > Now, KVM always implies exclude_hv afaict (for guests), I'm not sure
> > what, if anything Xen does on x86 (IIRC Brendan Gregg once said perf
> > works on Xen) -- nor quite sure who to ask, Boris, Jeurgen?
>
> perf does work inside guests.
>
> VPMU is managed by the Xen and it presents to the guest only samples
> that are associated with the guest. So from that perspective exclude_hv
> doesn't seem to be needed.
>
> There is a VPMU mode that allows profiling whole system (host and
> guests) from dom0, and this where exclude_hv might be useful. But this
> mode, ahem, needs some work.
Thanks Boris!
On Mon, Jan 07, 2019 at 04:27:30PM +0000, Andrew Murray wrote:
> The Cavium ThunderX2 UNCORE PMU driver doesn't support any event
> filtering. Let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability to
> simplify the code.
>
> Signed-off-by: Andrew Murray <[email protected]>
> ---
> drivers/perf/thunderx2_pmu.c | 10 +---------
> 1 file changed, 1 insertion(+), 9 deletions(-)
Acked-by: Will Deacon <[email protected]>
Thanks for fixing this up.
Will
Peter Zijlstra <[email protected]> writes:
> On Mon, Jan 07, 2019 at 04:27:27PM +0000, Andrew Murray wrote:
>> For drivers that do not support context exclusion let's advertise the
>> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
>> prevent us from handling events where any exclusion flags are set.
>> Let's also remove the now unnecessary check for exclusion flags.
>>
>> Signed-off-by: Andrew Murray <[email protected]>
>> ---
>> arch/x86/events/amd/ibs.c | 13 +------------
>> arch/x86/events/amd/power.c | 10 ++--------
>> arch/x86/events/intel/cstate.c | 12 +++---------
>> arch/x86/events/intel/rapl.c | 9 ++-------
>> arch/x86/events/intel/uncore_snb.c | 9 ++-------
>> arch/x86/events/msr.c | 10 ++--------
>> 6 files changed, 12 insertions(+), 51 deletions(-)
>
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
>
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.
>
> On the various exclude options; they are as follows (IIUC):
>
> - exclude_guest: we're a HV/host-kernel and we don't want the counter
> to run when we run a guest context.
>
> - exclude_host: we're a HV/host-kernel and we don't want the counter
> to run when we run in host context.
>
> - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
>
> Now, KVM always implies exclude_hv afaict (for guests)
On Power it mostly does.
There's some host code that can run in real mode (MMU off) and therefore
doesn't do a full context switch out of the guest (including the PMU),
so that's host code that is running while the guest PMCs are still
counting.
cheers