2014-02-27 21:31:37

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 00/11] powerpc: Add support for Power Hypervisor supplied performance counters

These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
performance counters: gpci ("get performance counter info") and 24x7.

The counters supplied by these interfaces are continually counting and never
need to be (and cannot be) disabled or enabled. They additionally do not
generate any interrupts. This makes them in some regards similar to software
counters, and as a result their implimentation shares some common code (which
an initial patch exposes) with the sw counters.

These 2 PMUs end up providing access to some cpu, core, and chip level counters
not exposed via other interfaces, and additionally allow monitoring the
performance of other lpars (guests) on the same host system. Because it
provides access to core and chip level counters, this pair of PMUs could be
thought of as powerpc's counterpart to x86's uncore events.

GPCI is an interface that already exists on some power6 and power7 machines
(depending on the fw version), but is rather in-flexible and code intensive to
add additional counters to. The 24x7 interfaces currently are designed to
co-exist with the gpci interface while replacing most of gpci's functionality
on newer systems. Right now, the 24x7 code I've submitted uses the gpci calls
to check if it has permission to access certain classes of counters.

--

Since v2:
- "sysfs: create bin_attributes under the requested group" is now in
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next
with commit-id: aabaf4c2050d21d39fe11eec889c508e84d6a328

- Split hv-24x7.h catalog definition into hv-24x7-catalog.h
- Remove unused 24x7 and gpci interface structures and enums (Michael Ellerman)
- Update docs to point to an external source for the full catalog docs
- Extend some of the patch changelogs (Peter Z)
- Remove hrtimer usage and just extern the event_idx helper (now renamed) (Peter Z)
- s/PMU_RANGE_ATTR/PMU_FORMAT_RANGE/ (and similar RESERVED rename) (Michael
Ellerman)
- hv_24x7: small clarifications in read_offset_data()'s comment
- hv_gpci: remove h_gpci_event_read() and h_gpci_event_del(), call _stop and
_update() directly (Michael Ellerman)
- Kconfig relocation, dependency changes, and rewording (Scott Wood and
Michael Ellerman)

Since v1:
- add a few attributes to hv_gpci and hv_24x7 that expose some info about the interfaces
- so the attributes show up in the right place, fix bin_attr creation in sysfs groups.
- move hv_gpci.h and hv_24x7.h interface headers into arch/powerpc/perf
- fix bit ordering in hv_gpci.h
- split out hv_perf_caps_get() and use it to probe for the interface before registering
- ensure proper alignment of hypervisor args
- add a few missing counter requests to hv_gpci.h
- s/CIR_xxx/CIR_XXX/ in hv_gpci.h
- s/modules_init/device_initcall/
- Don't set event->cpu, use the user provided one
- remove the union of gpci events, just give the user 1024 bytes to play with
- clarify some comments (the list of fw versions is now labeled)
- provide and event_24x7_request() that wraps single_24x7_request()
- probably some other small fixes I'm forgetting.


Cody P Schafer (11):
sysfs: create bin_attributes under the requested group
perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
perf: provide a common perf_event_nop_0() for use with .event_idx
powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
powerpc/perf: add hv_gpci interface header
powerpc/perf: add 24x7 interface headers
powerpc/perf: add a shared interface to get gpci version and
capabilities
powerpc/perf: add support for the hv gpci (get performance counter
info) interface
powerpc/perf: add support for the hv 24x7 interface
powerpc/perf: add kconfig option for hypervisor provided counters
powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes

.../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 +
.../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++
arch/powerpc/include/asm/hvcall.h | 5 +
arch/powerpc/perf/Makefile | 2 +
arch/powerpc/perf/hv-24x7-catalog.h | 33 ++
arch/powerpc/perf/hv-24x7.c | 492 +++++++++++++++++++++
arch/powerpc/perf/hv-24x7.h | 109 +++++
arch/powerpc/perf/hv-common.c | 39 ++
arch/powerpc/perf/hv-common.h | 17 +
arch/powerpc/perf/hv-gpci.c | 277 ++++++++++++
arch/powerpc/perf/hv-gpci.h | 73 +++
arch/powerpc/platforms/pseries/Kconfig | 12 +
fs/sysfs/group.c | 7 +-
include/linux/perf_event.h | 18 +
kernel/events/core.c | 10 +-
15 files changed, 1153 insertions(+), 7 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h
create mode 100644 arch/powerpc/perf/hv-24x7.c
create mode 100644 arch/powerpc/perf/hv-24x7.h
create mode 100644 arch/powerpc/perf/hv-common.c
create mode 100644 arch/powerpc/perf/hv-common.h
create mode 100644 arch/powerpc/perf/hv-gpci.c
create mode 100644 arch/powerpc/perf/hv-gpci.h

--
1.9.0


2014-02-27 21:31:42

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 03/11] perf: provide a common perf_event_nop_0() for use with .event_idx

Rather an having every pmu that needs a function that just returns 0 for
.event_idx define their own copy, reuse the one in kernel/events/core.c.

Rename from perf_swevent_event_idx() because we're no longer using it
for just software events. Naming is based on the perf_pmu_nop_*()
functions.

Signed-off-by: Cody P Schafer <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 10 +++++-----
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3da5081..24a7b45 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -560,6 +560,7 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
extern u64 perf_event_read_value(struct perf_event *event,
u64 *enabled, u64 *running);

+extern int perf_event_nop_0(struct perf_event *event);

struct perf_sample_data {
u64 type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 56003c6..2938a77 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event)
return 0;
}

-static int perf_swevent_event_idx(struct perf_event *event)
+int perf_event_nop_0(struct perf_event *event)
{
return 0;
}
@@ -5831,7 +5831,7 @@ static struct pmu perf_swevent = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,

- .event_idx = perf_swevent_event_idx,
+ .event_idx = perf_event_nop_0,
};

#ifdef CONFIG_EVENT_TRACING
@@ -5950,7 +5950,7 @@ static struct pmu perf_tracepoint = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,

- .event_idx = perf_swevent_event_idx,
+ .event_idx = perf_event_nop_0,
};

static inline void perf_tp_register(void)
@@ -6177,7 +6177,7 @@ static struct pmu perf_cpu_clock = {
.stop = cpu_clock_event_stop,
.read = cpu_clock_event_read,

- .event_idx = perf_swevent_event_idx,
+ .event_idx = perf_event_nop_0,
};

/*
@@ -6257,7 +6257,7 @@ static struct pmu perf_task_clock = {
.stop = task_clock_event_stop,
.read = task_clock_event_read,

- .event_idx = perf_swevent_event_idx,
+ .event_idx = perf_event_nop_0,
};

static void perf_pmu_nop_void(struct pmu *pmu)
--
1.9.0

2014-02-27 21:31:40

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 01/11] sysfs: create bin_attributes under the requested group

bin_attributes created/updated in create_files() (such as those listed
via (struct device).attribute_groups) were not placed under the
specified group, and instead appeared in the base kobj directory.

Fix this by making bin_attributes use creating code similar to normal
attributes.

A quick grep shows that no one is using bin_attrs in a named attribute
group yet, so we can do this without breaking anything in usespace.

Note that I do not add is_visible() support to
bin_attributes, though that could be done as well.

Signed-off-by: Cody P Schafer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---

No need to merge, already in driver-core-next as
aabaf4c2050d21d39fe11eec889c508e84d6a328, included for
reference/testing/verification only.

---

fs/sysfs/group.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 6b57938..aa04068 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -70,8 +70,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj,
if (grp->bin_attrs) {
for (bin_attr = grp->bin_attrs; *bin_attr; bin_attr++) {
if (update)
- sysfs_remove_bin_file(kobj, *bin_attr);
- error = sysfs_create_bin_file(kobj, *bin_attr);
+ kernfs_remove_by_name(parent,
+ (*bin_attr)->attr.name);
+ error = sysfs_add_file_mode_ns(parent,
+ &(*bin_attr)->attr, true,
+ (*bin_attr)->attr.mode, NULL);
if (error)
break;
}
--
1.9.0

2014-02-27 21:32:03

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/include/asm/hvcall.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index d8b600b..5dbbb29 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -274,6 +274,11 @@
/* Platform specific hcalls, used by KVM */
#define H_RTAS 0xf000

+/* "Platform specific hcalls", provided by PHYP */
+#define H_GET_24X7_CATALOG_PAGE 0xF078
+#define H_GET_24X7_DATA 0xF07C
+#define H_GET_PERF_COUNTER_INFO 0xF080
+
#ifndef __ASSEMBLY__

/**
--
1.9.0

2014-02-27 21:32:13

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 05/11] powerpc/perf: add hv_gpci interface header

"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format. This
header only describes the portions of the interface that we plan on
using in linux at this time.

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/hv-gpci.h | 73 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
create mode 100644 arch/powerpc/perf/hv-gpci.h

diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h
new file mode 100644
index 0000000..b25f460
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.h
@@ -0,0 +1,73 @@
+#ifndef LINUX_POWERPC_PERF_HV_GPCI_H_
+#define LINUX_POWERPC_PERF_HV_GPCI_H_
+
+#include <linux/types.h>
+
+/* From the document "H_GetPerformanceCounterInfo Interface" v1.07 */
+
+/* H_GET_PERF_COUNTER_INFO argument */
+struct hv_get_perf_counter_info_params {
+ __be32 counter_request; /* I */
+ __be32 starting_index; /* IO */
+ __be16 secondary_index; /* IO */
+ __be16 returned_values; /* O */
+ __be32 detail_rc; /* O, only needed when called via *_norets() */
+
+ /*
+ * O, size each of counter_value element in bytes, only set for version
+ * >= 0x3
+ */
+ __be16 cv_element_size;
+
+ /* I, 0 (zero) for versions < 0x3 */
+ __u8 counter_info_version_in;
+
+ /* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
+ __u8 counter_info_version_out;
+ __u8 reserved[0xC];
+ __u8 counter_value[];
+} __packed;
+
+/*
+ * counter info version => fw version/reference (spec version)
+ *
+ * 8 => power8 (1.07)
+ * [7 is skipped by spec 1.07]
+ * 6 => TLBIE (1.07)
+ * 5 => v7r7m0.phyp (1.05)
+ * [4 skipped]
+ * 3 => v7r6m0.phyp (?)
+ * [1,2 skipped]
+ * 0 => v7r{2,3,4}m0.phyp (?)
+ */
+#define COUNTER_INFO_VERSION_CURRENT 0x8
+
+/*
+ * These determine the counter_value[] layout and the meaning of starting_index
+ * and secondary_index.
+ *
+ * Unless otherwise noted, @secondary_index is unused and ignored.
+ */
+enum counter_info_requests {
+
+ /* GENERAL */
+
+ /* @starting_index: must be -1 (to refer to the current partition)
+ */
+ CIR_SYSTEM_PERFORMANCE_CAPABILITIES = 0X40,
+};
+
+struct cv_system_performance_capabilities {
+ /* If != 0, allowed to collect data from other partitions */
+ __u8 perf_collect_privileged;
+
+ /* These following are only valid if counter_info_version >= 0x3 */
+#define CV_CM_GA (1 << 7)
+#define CV_CM_EXPANDED (1 << 6)
+#define CV_CM_LAB (1 << 5)
+ /* remaining bits are reserved */
+ __u8 capability_mask;
+ __u8 reserved[0xE];
+} __packed;
+
+#endif
--
1.9.0

2014-02-27 21:32:21

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 06/11] powerpc/perf: add 24x7 interface headers

24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
paritialy in the included hv-24x7-catalog.h and fully in at
https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ).

The 24x7 data access is simply a copy operation into a 4 dimentional
array of 64bit counters (from hypervisor to kernel memory). There is no
interupt triggered on overflow, these are completely disjoint from the
typical power pmu.

This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/hv-24x7-catalog.h | 33 +++++++++++
arch/powerpc/perf/hv-24x7.h | 109 ++++++++++++++++++++++++++++++++++++
2 files changed, 142 insertions(+)
create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h
create mode 100644 arch/powerpc/perf/hv-24x7.h

diff --git a/arch/powerpc/perf/hv-24x7-catalog.h b/arch/powerpc/perf/hv-24x7-catalog.h
new file mode 100644
index 0000000..21b19dd
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7-catalog.h
@@ -0,0 +1,33 @@
+#ifndef LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_
+#define LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_
+
+#include <linux/types.h>
+
+/* From document "24x7 Event and Group Catalog Formats Proposal" v0.15 */
+
+struct hv_24x7_catalog_page_0 {
+#define HV_24X7_CATALOG_MAGIC 0x32347837 /* "24x7" in ASCII */
+ __be32 magic;
+ __be32 length; /* In 4096 byte pages */
+ __be64 version; /* XXX: arbitrary? what's the meaning/useage/purpose? */
+ __u8 build_time_stamp[16]; /* "YYYYMMDDHHMMSS\0\0" */
+ __u8 reserved2[32];
+ __be16 schema_data_offs; /* in 4096 byte pages */
+ __be16 schema_data_len; /* in 4096 byte pages */
+ __be16 schema_entry_count;
+ __u8 reserved3[2];
+ __be16 event_data_offs;
+ __be16 event_data_len;
+ __be16 event_entry_count;
+ __u8 reserved4[2];
+ __be16 group_data_offs; /* in 4096 byte pages */
+ __be16 group_data_len; /* in 4096 byte pages */
+ __be16 group_entry_count;
+ __u8 reserved5[2];
+ __be16 formula_data_offs; /* in 4096 byte pages */
+ __be16 formula_data_len; /* in 4096 byte pages */
+ __be16 formula_entry_count;
+ __u8 reserved6[2];
+} __packed;
+
+#endif
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
new file mode 100644
index 0000000..720ebce
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -0,0 +1,109 @@
+#ifndef LINUX_POWERPC_PERF_HV_24X7_H_
+#define LINUX_POWERPC_PERF_HV_24X7_H_
+
+#include <linux/types.h>
+
+struct hv_24x7_request {
+ /* PHYSICAL domains require enabling via phyp/hmc. */
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE 0x03
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP 0x04
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE 0x05
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06
+ __u8 performance_domain;
+ __u8 reserved[0x1];
+
+ /* bytes to read starting at @data_offset. must be a multiple of 8 */
+ __be16 data_size;
+
+ /*
+ * byte offset within the perf domain to read from. must be 8 byte
+ * aligned
+ */
+ __be32 data_offset;
+
+ /*
+ * only valid for VIRTUAL_PROCESSOR domains, ignored for others.
+ * -1 means "current partition only"
+ * Enabling via phyp/hmc required for non-"-1" values. 0 forbidden
+ * unless requestor is 0.
+ */
+ __be16 starting_lpar_ix;
+
+ /*
+ * Ignored when @starting_lpar_ix == -1
+ * Ignored when @performance_domain is not VIRTUAL_PROCESSOR_*
+ * -1 means "infinite" or all
+ */
+ __be16 max_num_lpars;
+
+ /* chip, core, or virtual processor based on @performance_domain */
+ __be16 starting_ix;
+ __be16 max_ix;
+} __packed;
+
+struct hv_24x7_request_buffer {
+ /* 0 - ? */
+ /* 1 - ? */
+#define HV_24X7_IF_VERSION_CURRENT 0x01
+ __u8 interface_version;
+ __u8 num_requests;
+ __u8 reserved[0xE];
+ struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element {
+ __be16 lpar_ix;
+
+ /*
+ * represents the core, chip, or virtual processor based on the
+ * request's @performance_domain
+ */
+ __be16 domain_ix;
+
+ /* -1 if @performance_domain does not refer to a virtual processor */
+ __be32 lpar_cfg_instance_id;
+
+ /* size = @result_element_data_size of cointaining result. */
+ __u8 element_data[];
+} __packed;
+
+struct hv_24x7_result {
+ __u8 result_ix;
+
+ /*
+ * 0 = not all result elements fit into the buffer, additional requests
+ * required
+ * 1 = all result elements were returned
+ */
+ __u8 results_complete;
+ __be16 num_elements_returned;
+
+ /* This is a copy of @data_size from the coresponding hv_24x7_request */
+ __be16 result_element_data_size;
+ __u8 reserved[0x2];
+
+ /* WARNING: only valid for first result element due to variable sizes
+ * of result elements */
+ /* struct hv_24x7_result_element[@num_elements_returned] */
+ struct hv_24x7_result_element elements[];
+} __packed;
+
+struct hv_24x7_data_result_buffer {
+ /* See versioning for request buffer */
+ __u8 interface_version;
+
+ __u8 num_results;
+ __u8 reserved[0x1];
+ __u8 failing_request_ix;
+ __be32 detailed_rc;
+ __be64 cec_cfg_instance_id;
+ __be64 catalog_version_num;
+ __u8 reserved2[0x8];
+ /* WARNING: only valid for the first result due to variable sizes of
+ * results */
+ struct hv_24x7_result results[]; /* [@num_results] */
+} __packed;
+
+#endif
--
1.9.0

2014-02-27 21:32:32

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 07/11] powerpc/perf: add a shared interface to get gpci version and capabilities

This exposes a simple way to grab the firmware provided
collect_priveliged, ga, expanded, and lab capability bits. All of these
bits come in from the same gpci request, so we've exposed all of them.

Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7
code, the other bits are simply exposed in sysfs to inform the user.

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/hv-common.c | 39 +++++++++++++++++++++++++++++++++++++++
arch/powerpc/perf/hv-common.h | 17 +++++++++++++++++
2 files changed, 56 insertions(+)
create mode 100644 arch/powerpc/perf/hv-common.c
create mode 100644 arch/powerpc/perf/hv-common.h

diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
new file mode 100644
index 0000000..47e02b3
--- /dev/null
+++ b/arch/powerpc/perf/hv-common.c
@@ -0,0 +1,39 @@
+#include <asm/io.h>
+#include <asm/hvcall.h>
+
+#include "hv-gpci.h"
+#include "hv-common.h"
+
+unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
+{
+ unsigned long r;
+ struct p {
+ struct hv_get_perf_counter_info_params params;
+ struct cv_system_performance_capabilities caps;
+ } __packed __aligned(sizeof(uint64_t));
+
+ struct p arg = {
+ .params = {
+ .counter_request = cpu_to_be32(
+ CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
+ .starting_index = cpu_to_be32(-1),
+ .counter_info_version_in = 0,
+ }
+ };
+
+ r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+ virt_to_phys(&arg), sizeof(arg));
+
+ if (r)
+ return r;
+
+ pr_devel("capability_mask: 0x%x\n", arg.caps.capability_mask);
+
+ caps->version = arg.params.counter_info_version_out;
+ caps->collect_privileged = !!arg.caps.perf_collect_privileged;
+ caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
+ caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
+ caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
+
+ return r;
+}
diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
new file mode 100644
index 0000000..7e615bd
--- /dev/null
+++ b/arch/powerpc/perf/hv-common.h
@@ -0,0 +1,17 @@
+#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
+#define LINUX_POWERPC_PERF_HV_COMMON_H_
+
+#include <linux/types.h>
+
+struct hv_perf_caps {
+ u16 version;
+ u16 collect_privileged:1,
+ ga:1,
+ expanded:1,
+ lab:1,
+ unused:12;
+};
+
+unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
+
+#endif
--
1.9.0

2014-02-27 21:32:43

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface

This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Example usage via perf tool:

perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/hv-gpci.c | 277 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 277 insertions(+)
create mode 100644 arch/powerpc/perf/hv-gpci.c

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
new file mode 100644
index 0000000..2f64732
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -0,0 +1,277 @@
+/*
+ * Hypervisor supplied "gpci" ("get performance counter info") performance
+ * counter support
+ *
+ * Author: Cody P Schafer <[email protected]>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define pr_fmt(fmt) "hv-gpci: " fmt
+
+#include <linux/init.h>
+#include <linux/perf_event.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-gpci.h"
+#include "hv-common.h"
+
+PMU_FORMAT_RANGE(request, config, 0, 31); /* u32 */
+PMU_FORMAT_RANGE(starting_index, config, 32, 63); /* u32 */
+PMU_FORMAT_RANGE(secondary_index, config1, 0, 15); /* u16 */
+PMU_FORMAT_RANGE(counter_info_version, config1, 16, 23); /* u8 */
+PMU_FORMAT_RANGE(length, config1, 24, 31); /* u8, bytes of data (1-8) */
+PMU_FORMAT_RANGE(offset, config1, 32, 63); /* u32, byte offset */
+
+static struct attribute *format_attrs[] = {
+ &format_attr_request.attr,
+ &format_attr_starting_index.attr,
+ &format_attr_secondary_index.attr,
+ &format_attr_counter_info_version.attr,
+
+ &format_attr_offset.attr,
+ &format_attr_length.attr,
+ NULL,
+};
+
+static struct attribute_group format_group = {
+ .name = "format",
+ .attrs = format_attrs,
+};
+
+#define HV_CAPS_ATTR(_name, _format) \
+static ssize_t _name##_show(struct device *dev, \
+ struct device_attribute *attr, \
+ char *page) \
+{ \
+ struct hv_perf_caps caps; \
+ unsigned long hret = hv_perf_caps_get(&caps); \
+ if (hret) \
+ return -EIO; \
+ \
+ return sprintf(page, _format, caps._name); \
+} \
+static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name)
+
+static ssize_t kernel_version_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "0x%x\n", COUNTER_INFO_VERSION_CURRENT);
+}
+
+DEVICE_ATTR_RO(kernel_version);
+HV_CAPS_ATTR(version, "0x%x\n");
+HV_CAPS_ATTR(ga, "%d\n");
+HV_CAPS_ATTR(expanded, "%d\n");
+HV_CAPS_ATTR(lab, "%d\n");
+HV_CAPS_ATTR(collect_privileged, "%d\n");
+
+static struct attribute *interface_attrs[] = {
+ &dev_attr_kernel_version.attr,
+ &hv_caps_attr_version.attr,
+ &hv_caps_attr_ga.attr,
+ &hv_caps_attr_expanded.attr,
+ &hv_caps_attr_lab.attr,
+ &hv_caps_attr_collect_privileged.attr,
+ NULL,
+};
+
+static struct attribute_group interface_group = {
+ .name = "interface",
+ .attrs = interface_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+ &format_group,
+ &interface_group,
+ NULL,
+};
+
+#define GPCI_MAX_DATA_BYTES \
+ (1024 - sizeof(struct hv_get_perf_counter_info_params))
+
+static unsigned long single_gpci_request(u32 req, u32 starting_index,
+ u16 secondary_index, u8 version_in, u32 offset, u8 length,
+ u64 *value)
+{
+ unsigned long ret;
+ size_t i;
+ u64 count;
+
+ struct {
+ struct hv_get_perf_counter_info_params params;
+ uint8_t bytes[GPCI_MAX_DATA_BYTES];
+ } __packed __aligned(sizeof(uint64_t)) arg = {
+ .params = {
+ .counter_request = cpu_to_be32(req),
+ .starting_index = cpu_to_be32(starting_index),
+ .secondary_index = cpu_to_be16(secondary_index),
+ .counter_info_version_in = version_in,
+ }
+ };
+
+ ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+ virt_to_phys(&arg), sizeof(arg));
+ if (ret) {
+ pr_devel("hcall failed: 0x%lx\n", ret);
+ return ret;
+ }
+
+ /*
+ * we verify offset and length are within the zeroed buffer at event
+ * init.
+ */
+ count = 0;
+ for (i = offset; i < offset + length; i++)
+ count |= arg.bytes[i] << (i - offset);
+
+ *value = count;
+ return ret;
+}
+
+static u64 h_gpci_get_value(struct perf_event *event)
+{
+ u64 count;
+ unsigned long ret = single_gpci_request(event_get_request(event),
+ event_get_starting_index(event),
+ event_get_secondary_index(event),
+ event_get_counter_info_version(event),
+ event_get_offset(event),
+ event_get_length(event),
+ &count);
+ if (ret)
+ return 0;
+ return count;
+}
+
+static void h_gpci_event_update(struct perf_event *event)
+{
+ s64 prev;
+ u64 now = h_gpci_get_value(event);
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void h_gpci_event_start(struct perf_event *event, int flags)
+{
+ local64_set(&event->hw.prev_count, h_gpci_get_value(event));
+}
+
+static void h_gpci_event_stop(struct perf_event *event, int flags)
+{
+ h_gpci_event_update(event);
+}
+
+static int h_gpci_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ h_gpci_event_start(event, flags);
+
+ return 0;
+}
+
+static int h_gpci_event_init(struct perf_event *event)
+{
+ u64 count;
+ u8 length;
+
+ /* Not our event */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* config2 is unused */
+ if (event->attr.config2) {
+ pr_devel("config2 set when reserved\n");
+ return -EINVAL;
+ }
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ is_sampling_event(event)) /* no sampling */
+ return -EINVAL;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ length = event_get_length(event);
+ if (length < 1 || length > 8) {
+ pr_devel("length invalid\n");
+ return -EINVAL;
+ }
+
+ /* last byte within the buffer? */
+ if ((event_get_offset(event) + length) > GPCI_MAX_DATA_BYTES) {
+ pr_devel("request outside of buffer: %zu > %zu\n",
+ (size_t)event_get_offset(event) + length,
+ GPCI_MAX_DATA_BYTES);
+ return -EINVAL;
+ }
+
+ /* check if the request works... */
+ if (single_gpci_request(event_get_request(event),
+ event_get_starting_index(event),
+ event_get_secondary_index(event),
+ event_get_counter_info_version(event),
+ event_get_offset(event),
+ length,
+ &count)) {
+ pr_devel("gpci hcall failed\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static struct pmu h_gpci_pmu = {
+ .task_ctx_nr = perf_invalid_context,
+
+ .name = "hv_gpci",
+ .attr_groups = attr_groups,
+ .event_init = h_gpci_event_init,
+ .add = h_gpci_event_add,
+ .del = h_gpci_event_stop,
+ .start = h_gpci_event_start,
+ .stop = h_gpci_event_stop,
+ .read = h_gpci_event_update,
+
+ .event_idx = perf_event_nop_0,
+};
+
+static int hv_gpci_init(void)
+{
+ int r;
+ unsigned long hret;
+ struct hv_perf_caps caps;
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+ pr_info("not a virtualized system, not enabling\n");
+ return -ENODEV;
+ }
+
+ hret = hv_perf_caps_get(&caps);
+ if (hret) {
+ pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+ hret);
+ return -ENODEV;
+ }
+
+ r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
+ if (r)
+ return r;
+
+ return 0;
+}
+
+device_initcall(hv_gpci_init);
--
1.9.0

2014-02-27 21:32:50

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 09/11] powerpc/perf: add support for the hv 24x7 interface

This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Example usage via perf tool:

perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/hv-24x7.c | 492 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 492 insertions(+)
create mode 100644 arch/powerpc/perf/hv-24x7.c

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
new file mode 100644
index 0000000..c1847a3
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -0,0 +1,492 @@
+/*
+ * Hypervisor supplied "24x7" performance counter support
+ *
+ * Author: Cody P Schafer <[email protected]>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-24x7: " fmt
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-24x7.h"
+#include "hv-24x7-catalog.h"
+#include "hv-common.h"
+
+/*
+ * TODO: Merging events:
+ * - Think of the hcall as an interface to a 4d array of counters:
+ * - x = domains
+ * - y = indexes in the domain (core, chip, vcpu, node, etc)
+ * - z = offset into the counter space
+ * - w = lpars (guest vms, "logical partitions")
+ * - A single request is: x,y,y_last,z,z_last,w,w_last
+ * - this means we can retrieve a rectangle of counters in y,z for a single x.
+ *
+ * - Things to consider (ignoring w):
+ * - input cost_per_request = 16
+ * - output cost_per_result(ys,zs) = 8 + 8 * ys + ys * zs
+ * - limited number of requests per hcall (must fit into 4K bytes)
+ * - 4k = 16 [buffer header] - 16 [request size] * request_count
+ * - 255 requests per hcall
+ * - sometimes it will be more efficient to read extra data and discard
+ */
+
+PMU_FORMAT_RANGE(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */
+PMU_FORMAT_RANGE(starting_index, config, 16, 31); /* u16 */
+PMU_FORMAT_RANGE(offset, config, 32, 63); /* u32, see "data_offset" */
+PMU_FORMAT_RANGE(lpar, config1, 0, 15); /* u16 */
+
+PMU_FORMAT_RANGE_RESERVED(reserved1, config, 4, 15);
+PMU_FORMAT_RANGE_RESERVED(reserved2, config1, 16, 63);
+PMU_FORMAT_RANGE_RESERVED(reserved3, config2, 0, 63);
+
+static struct attribute *format_attrs[] = {
+ &format_attr_domain.attr,
+ &format_attr_offset.attr,
+ &format_attr_starting_index.attr,
+ &format_attr_lpar.attr,
+ NULL,
+};
+
+static struct attribute_group format_group = {
+ .name = "format",
+ .attrs = format_attrs,
+};
+
+/*
+ * read_offset_data - copy data from one buffer to another while treating the
+ * source buffer as a small view on the total avaliable
+ * source data.
+ *
+ * @dest: buffer to copy into
+ * @dest_len: length of @dest in bytes
+ * @requested_offset: the offset within the source data we want. Must be > 0
+ * @src: buffer to copy data from
+ * @src_len: length of @src in bytes
+ * @source_offset: the offset in the sorce data that (src,src_len) refers to.
+ * Must be > 0
+ *
+ * returns the number of bytes copied.
+ *
+ * The following ascii art shows the various buffer possitioning we need to
+ * handle, assigns some arbitrary varibles to points on the buffer, and then
+ * shows how we fiddle with those values to get things we care about (copy
+ * start in src and copy len)
+ *
+ * s = @src buffer
+ * d = @dest buffer
+ * '.' areas in d are written to.
+ *
+ * u
+ * x w v z
+ * d |.........|
+ * s |----------------------|
+ *
+ * u
+ * x w z v
+ * d |........------|
+ * s |------------------|
+ *
+ * x w u,z,v
+ * d |........|
+ * s |------------------|
+ *
+ * x,w u,v,z
+ * d |..................|
+ * s |------------------|
+ *
+ * x u
+ * w v z
+ * d |........|
+ * s |------------------|
+ *
+ * x z w v
+ * d |------|
+ * s |------|
+ *
+ * x = source_offset
+ * w = requested_offset
+ * z = source_offset + src_len
+ * v = requested_offset + dest_len
+ *
+ * w_offset_in_s = w - x = requested_offset - source_offset
+ * z_offset_in_s = z - x = src_len
+ * v_offset_in_s = v - x = request_offset + dest_len - src_len
+ */
+static ssize_t read_offset_data(void *dest, size_t dest_len,
+ loff_t requested_offset, void *src,
+ size_t src_len, loff_t source_offset)
+{
+ size_t w_offset_in_s = requested_offset - source_offset;
+ size_t z_offset_in_s = src_len;
+ size_t v_offset_in_s = requested_offset + dest_len - src_len;
+ size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
+ size_t copy_len = u_offset_in_s - w_offset_in_s;
+
+ if (requested_offset < 0 || source_offset < 0)
+ return -EINVAL;
+
+ if (z_offset_in_s <= w_offset_in_s)
+ return 0;
+
+ memcpy(dest, src + w_offset_in_s, copy_len);
+ return copy_len;
+}
+
+static unsigned long h_get_24x7_catalog_page(char page[static 4096],
+ u32 version, u32 index)
+{
+ WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
+ return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
+ virt_to_phys(page),
+ version,
+ index);
+}
+
+static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *bin_attr, char *buf,
+ loff_t offset, size_t count)
+{
+ unsigned long hret;
+ ssize_t ret = 0;
+ size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
+ loff_t page_offset = 0;
+ uint32_t catalog_version_num = 0;
+ void *page = kmalloc(4096, GFP_USER);
+ struct hv_24x7_catalog_page_0 *page_0 = page;
+ if (!page)
+ return -ENOMEM;
+
+
+ hret = h_get_24x7_catalog_page(page, 0, 0);
+ if (hret) {
+ ret = -EIO;
+ goto e_free;
+ }
+
+ catalog_version_num = be32_to_cpu(page_0->version);
+ catalog_page_len = be32_to_cpu(page_0->length);
+ catalog_len = catalog_page_len * 4096;
+
+ page_offset = offset / 4096;
+ page_count = count / 4096;
+
+ if (page_offset >= catalog_page_len)
+ goto e_free;
+
+ if (page_offset != 0) {
+ hret = h_get_24x7_catalog_page(page, catalog_version_num,
+ page_offset);
+ if (hret) {
+ ret = -EIO;
+ goto e_free;
+ }
+ }
+
+ ret = read_offset_data(buf, count, offset,
+ page, 4096, page_offset * 4096);
+e_free:
+ if (hret)
+ pr_err("h_get_24x7_catalog_page(ver=%d, page=%lld) failed: rc=%ld\n",
+ catalog_version_num, page_offset, hret);
+ kfree(page);
+
+ pr_devel("catalog_read: offset=%lld(%lld) count=%zu(%zu) catalog_len=%zu(%zu) => %zd\n",
+ offset, page_offset, count, page_count, catalog_len,
+ catalog_page_len, ret);
+
+ return ret;
+}
+
+#define PAGE_0_ATTR(_name, _fmt, _expr) \
+static ssize_t _name##_show(struct device *dev, \
+ struct device_attribute *dev_attr, \
+ char *buf) \
+{ \
+ unsigned long hret; \
+ ssize_t ret = 0; \
+ void *page = kmalloc(4096, GFP_USER); \
+ struct hv_24x7_catalog_page_0 *page_0 = page; \
+ if (!page) \
+ return -ENOMEM; \
+ hret = h_get_24x7_catalog_page(page, 0, 0); \
+ if (hret) { \
+ ret = -EIO; \
+ goto e_free; \
+ } \
+ ret = sprintf(buf, _fmt, _expr); \
+e_free: \
+ kfree(page); \
+ return ret; \
+} \
+static DEVICE_ATTR_RO(_name)
+
+PAGE_0_ATTR(catalog_version, "%lld\n",
+ (unsigned long long)be32_to_cpu(page_0->version));
+PAGE_0_ATTR(catalog_len, "%lld\n",
+ (unsigned long long)be32_to_cpu(page_0->length) * 4096);
+static BIN_ATTR_RO(catalog, 0/* real length varies */);
+
+static struct bin_attribute *if_bin_attrs[] = {
+ &bin_attr_catalog,
+ NULL,
+};
+
+static struct attribute *if_attrs[] = {
+ &dev_attr_catalog_len.attr,
+ &dev_attr_catalog_version.attr,
+ NULL,
+};
+
+static struct attribute_group if_group = {
+ .name = "interface",
+ .bin_attrs = if_bin_attrs,
+ .attrs = if_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+ &format_group,
+ &if_group,
+ NULL,
+};
+
+static bool is_physical_domain(int domain)
+{
+ return domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP ||
+ domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
+}
+
+static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
+ u16 lpar, u64 *res,
+ bool success_expected)
+{
+ unsigned long ret;
+
+ /*
+ * request_buffer and result_buffer are not required to be 4k aligned,
+ * but are not allowed to cross any 4k boundary. Aligning them to 4k is
+ * the simplest way to ensure that.
+ */
+ struct reqb {
+ struct hv_24x7_request_buffer buf;
+ struct hv_24x7_request req;
+ } __packed __aligned(4096) request_buffer = {
+ .buf = {
+ .interface_version = HV_24X7_IF_VERSION_CURRENT,
+ .num_requests = 1,
+ },
+ .req = {
+ .performance_domain = domain,
+ .data_size = cpu_to_be16(8),
+ .data_offset = cpu_to_be32(offset),
+ .starting_lpar_ix = cpu_to_be16(lpar),
+ .max_num_lpars = cpu_to_be16(1),
+ .starting_ix = cpu_to_be16(ix),
+ .max_ix = cpu_to_be16(1),
+ }
+ };
+
+ struct resb {
+ struct hv_24x7_data_result_buffer buf;
+ struct hv_24x7_result res;
+ struct hv_24x7_result_element elem;
+ __be64 result;
+ } __packed __aligned(4096) result_buffer = {};
+
+ ret = plpar_hcall_norets(H_GET_24X7_DATA,
+ virt_to_phys(&request_buffer), sizeof(request_buffer),
+ virt_to_phys(&result_buffer), sizeof(result_buffer));
+
+ if (ret) {
+ if (success_expected)
+ pr_err_ratelimited("hcall failed: %d %#x %#x %d => 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+ domain, offset, ix, lpar,
+ ret, ret,
+ result_buffer.buf.detailed_rc,
+ result_buffer.buf.failing_request_ix);
+ return ret;
+ }
+
+ *res = be64_to_cpu(result_buffer.result);
+ return ret;
+}
+
+static unsigned long event_24x7_request(struct perf_event *event, u64 *res,
+ bool success_expected)
+{
+ return single_24x7_request(event_get_domain(event),
+ event_get_offset(event),
+ event_get_starting_index(event),
+ event_get_lpar(event),
+ res,
+ success_expected);
+}
+
+static int h_24x7_event_init(struct perf_event *event)
+{
+ struct hv_perf_caps caps;
+ unsigned domain;
+ unsigned long hret;
+ u64 ct;
+
+ /* Not our event */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Unused areas must be 0 */
+ if (event_get_reserved1(event) ||
+ event_get_reserved2(event) ||
+ event_get_reserved3(event)) {
+ pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 0x%llx(0x%llx)\n",
+ event->attr.config,
+ event_get_reserved1(event),
+ event->attr.config1,
+ event_get_reserved2(event),
+ event->attr.config2,
+ event_get_reserved3(event));
+ return -EINVAL;
+ }
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ is_sampling_event(event)) /* no sampling */
+ return -EINVAL;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ /* offset must be 8 byte aligned */
+ if (event_get_offset(event) % 8) {
+ pr_devel("bad alignment\n");
+ return -EINVAL;
+ }
+
+ /* Domains above 6 are invalid */
+ domain = event_get_domain(event);
+ if (domain > 6) {
+ pr_devel("invalid domain %d\n", domain);
+ return -EINVAL;
+ }
+
+ hret = hv_perf_caps_get(&caps);
+ if (hret) {
+ pr_devel("could not get capabilities: rc=%ld\n", hret);
+ return -EIO;
+ }
+
+ /* PHYSICAL domains & other lpars require extra capabilities */
+ if (!caps.collect_privileged && (is_physical_domain(domain) ||
+ (event_get_lpar(event) != event_get_lpar_max()))) {
+ pr_devel("hv permisions disallow: is_physical_domain:%d, lpar=0x%llx\n",
+ is_physical_domain(domain),
+ event_get_lpar(event));
+ return -EACCES;
+ }
+
+ /* see if the event complains */
+ if (event_24x7_request(event, &ct, false)) {
+ pr_devel("test hcall failed\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static u64 h_24x7_get_value(struct perf_event *event)
+{
+ unsigned long ret;
+ u64 ct;
+ ret = event_24x7_request(event, &ct, true);
+ if (ret)
+ /* We checked this in event init, shouldn't fail here... */
+ return 0;
+
+ return ct;
+}
+
+static void h_24x7_event_update(struct perf_event *event)
+{
+ s64 prev;
+ u64 now;
+ now = h_24x7_get_value(event);
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void h_24x7_event_start(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_RELOAD)
+ local64_set(&event->hw.prev_count, h_24x7_get_value(event));
+}
+
+static void h_24x7_event_stop(struct perf_event *event, int flags)
+{
+ h_24x7_event_update(event);
+}
+
+static int h_24x7_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ h_24x7_event_start(event, flags);
+
+ return 0;
+}
+
+static struct pmu h_24x7_pmu = {
+ .task_ctx_nr = perf_invalid_context,
+
+ .name = "hv_24x7",
+ .attr_groups = attr_groups,
+ .event_init = h_24x7_event_init,
+ .add = h_24x7_event_add,
+ .del = h_24x7_event_stop,
+ .start = h_24x7_event_start,
+ .stop = h_24x7_event_stop,
+ .read = h_24x7_event_update,
+
+ .event_idx = perf_event_nop_0,
+};
+
+static int hv_24x7_init(void)
+{
+ int r;
+ unsigned long hret;
+ struct hv_perf_caps caps;
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+ pr_info("not a virtualized system, not enabling\n");
+ return -ENODEV;
+ }
+
+ hret = hv_perf_caps_get(&caps);
+ if (hret) {
+ pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+ hret);
+ return -ENODEV;
+ }
+
+ r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
+ if (r)
+ return r;
+
+ return 0;
+}
+
+device_initcall(hv_24x7_init);
--
1.9.0

2014-02-27 21:32:57

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 10/11] powerpc/perf: add kconfig option for hypervisor provided counters

Signed-off-by: Cody P Schafer <[email protected]>
---
arch/powerpc/perf/Makefile | 2 ++
arch/powerpc/platforms/pseries/Kconfig | 12 ++++++++++++
2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 60d71ee..f9c083a 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o

+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
+
obj-$(CONFIG_PPC64) += $(obj64-y)
obj-$(CONFIG_PPC32) += $(obj32-y)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 80b1d57..2cb8b77 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -111,6 +111,18 @@ config CMM
will be reused for other LPARs. The interface allows firmware to
balance memory across many LPARs.

+config HV_PERF_CTRS
+ bool "Hypervisor supplied PMU events (24x7 & GPCI)"
+ default y
+ depends on PERF_EVENTS && PPC_PSERIES
+ help
+ Enable access to hypervisor supplied counters in perf. Currently,
+ this enables code that uses the hcall GetPerfCounterInfo and 24x7
+ interfaces to retrieve counters. GPCI exists on Power 6 and later
+ systems. 24x7 is available on Power 8 systems.
+
+ If unsure, select Y.
+
config DTL
bool "Dispatch Trace Log"
depends on PPC_SPLPAR && DEBUG_FS
--
1.9.0

2014-02-27 21:33:07

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 11/11] powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes

gpci and 24x7 expose some device specific attributes. Add some
documentation for them.

Signed-off-by: Cody P Schafer <[email protected]>
---
.../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 ++++++++++++
.../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++++++++++++++++++++++
2 files changed, 66 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
new file mode 100644
index 0000000..e78ee79
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -0,0 +1,23 @@
+What: /sys/bus/event_source/devices/hv_24x7/interface/catalog
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ Provides access to the binary "24x7 catalog" provided by the
+ hypervisor on POWER7 and 8 systems. This catalog lists events
+ avaliable from the powerpc "hv_24x7" pmu. Its format is
+ documented here:
+ https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h
+
+What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_length
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ A number equal to the length in bytes of the catalog. This is
+ also extractable from the provided binary "catalog" sysfs entry.
+
+What: /sys/bus/event_source/devices/hv_24x7/interface/catalog_version
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ Exposes the "version" field of the 24x7 catalog. This is also
+ extractable from the provided binary "catalog" sysfs entry.
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
new file mode 100644
index 0000000..3fa58c2
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -0,0 +1,43 @@
+What: /sys/bus/event_source/devices/hv_gpci/interface/collect_privileged
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ '0' if the hypervisor is configured to forbid access to event
+ counters being accumulated by other guests and to physical
+ domain event counters.
+ '1' if that access is allowed.
+
+What: /sys/bus/event_source/devices/hv_gpci/interface/ga
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ 0 or 1. Indicates whether we have access to "GA" events (listed
+ in arch/powerpc/perf/hv-gpci.h).
+
+What: /sys/bus/event_source/devices/hv_gpci/interface/expanded
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ 0 or 1. Indicates whether we have access to "EXPANDED" events (listed
+ in arch/powerpc/perf/hv-gpci.h).
+
+What: /sys/bus/event_source/devices/hv_gpci/interface/lab
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ 0 or 1. Indicates whether we have access to "LAB" events (listed
+ in arch/powerpc/perf/hv-gpci.h).
+
+What: /sys/bus/event_source/devices/hv_gpci/interface/version
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ A number indicating the version of the gpci interface that the
+ hypervisor reports supporting.
+
+What: /sys/bus/event_source/devices/hv_gpci/interface/kernel_version
+Date: February 2014
+Contact: Cody P Schafer <[email protected]>
+Description:
+ A number indicating the latest version of the gpci interface
+ that the kernel is aware of.
--
1.9.0

2014-02-27 21:31:35

by Cody P Schafer

[permalink] [raw]
Subject: [PATCH v3 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus

Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved
areas) which generate functions to extract the relevent bits from
event->attr.config{,1,2} for use by sw-like pmus where the
'config{,1,2}' values don't map directly to hardware registers.

Signed-off-by: Cody P Schafer <[email protected]>
---
include/linux/perf_event.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e56b07f..3da5081 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -871,4 +871,21 @@ _name##_show(struct device *dev, \
\
static struct device_attribute format_attr_##_name = __ATTR_RO(_name)

+#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end) \
+PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end); \
+PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end)
+
+#define PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end) \
+static u64 event_get_##name##_max(void) \
+{ \
+ int bits = (bit_end) - (bit_start) + 1; \
+ return ((0x1ULL << (bits - 1ULL)) - 1ULL) | \
+ (0xFULL << (bits - 4ULL)); \
+} \
+static u64 event_get_##name(struct perf_event *event) \
+{ \
+ return (event->attr.attr_var >> (bit_start)) & \
+ event_get_##name##_max(); \
+}
+
#endif /* _LINUX_PERF_EVENT_H */
--
1.9.0