2013-09-10 15:30:13

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 0/5] xen/PMU: PMU support for Xen PV guests

This is the Linux side of Xen PMU support for PV guests, including dom0. Only
kernel changes are here, toolstack patch will be provided separately.

Here is description from the hypervisor patch submission that applies to this
series as well:

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.
* Will fail to load under XSM: we ran out of bits in permissions vector and
this needs to be fixed separately


A few notes that may help reviewing:

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.

* PMU interrupts are taken by hypervisor at NMI level for both HVM and PV.

* Guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.

* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
* disable: VPMU is off
* enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
* priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
in guests is suspended.

* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)

* VPMU infrastructure is now used for both HVM and PV and therefore has been moved
up from hvm subtree



Boris Ostrovsky (5):
xen: xensyms support
xen/PMU: Sysfs interface for setting Xen PMU mode
xen/PMU: Initialization code for Xen PMU
xen/PMU: Add support for PMU registes on PV guests
xen/PMU: Cache MSR accesses during interrupt handling

arch/x86/include/asm/xen/hypercall.h | 6 +
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/enlighten.c | 27 ++-
arch/x86/xen/pmu.c | 422 +++++++++++++++++++++++++++++++++++
arch/x86/xen/pmu.h | 15 ++
arch/x86/xen/smp.c | 31 ++-
arch/x86/xen/xen-head.S | 5 +-
drivers/xen/Kconfig | 5 +
drivers/xen/sys-hypervisor.c | 118 ++++++++++
drivers/xen/xenfs/Makefile | 1 +
drivers/xen/xenfs/super.c | 3 +
drivers/xen/xenfs/xenfs.h | 1 +
drivers/xen/xenfs/xensyms.c | 170 ++++++++++++++
include/xen/interface/platform.h | 21 ++
include/xen/interface/xen.h | 2 +
include/xen/interface/xenpmu.h | 127 +++++++++++
16 files changed, 947 insertions(+), 9 deletions(-)
create mode 100644 arch/x86/xen/pmu.c
create mode 100644 arch/x86/xen/pmu.h
create mode 100644 drivers/xen/xenfs/xensyms.c
create mode 100644 include/xen/interface/xenpmu.h

--
1.8.1.4


2013-09-10 15:30:15

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 2/5] xen/PMU: Sysfs interface for setting Xen PMU mode

Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 6 ++
arch/x86/xen/xen-head.S | 5 +-
drivers/xen/sys-hypervisor.c | 118 +++++++++++++++++++++++++++++++++++
include/xen/interface/xen.h | 1 +
include/xen/interface/xenpmu.h | 44 +++++++++++++
5 files changed, 173 insertions(+), 1 deletion(-)
create mode 100644 include/xen/interface/xenpmu.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index e709884..f022cef 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
return _hypercall1(int, tmem_op, op);
}

+static inline int
+HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
+{
+ return _hypercall2(int, xenpmu_op, op, arg);
+}
+
static inline void
MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
{
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..f2415e6 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -73,8 +73,11 @@ NEXT_HYPERCALL(sysctl)
NEXT_HYPERCALL(domctl)
NEXT_HYPERCALL(kexec_op)
NEXT_HYPERCALL(tmem_op) /* 38 */
+ENTRY(xenclient_rsvd)
+ .skip 32
+NEXT_HYPERCALL(xenpmu_op) /* 40 */
ENTRY(xen_hypercall_rsvr)
- .skip 320
+ .skip 256
NEXT_HYPERCALL(mca) /* 48 */
NEXT_HYPERCALL(arch_1)
NEXT_HYPERCALL(arch_2)
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..2218154 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -20,6 +20,7 @@
#include <xen/xenbus.h>
#include <xen/interface/xen.h>
#include <xen/interface/version.h>
+#include <xen/interface/xenpmu.h>

#define HYPERVISOR_ATTR_RO(_name) \
static struct hyp_sysfs_attr _name##_attr = __ATTR_RO(_name)
@@ -368,6 +369,115 @@ static void xen_properties_destroy(void)
sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
}

+struct pmu_mode {
+ const char *name;
+ uint32_t mode;
+};
+struct pmu_mode pmu_modes[] = {
+ {"enable", VPMU_ON},
+ {"priv_enable", VPMU_PRIV},
+ {"disable", 0}
+};
+static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+ int ret;
+ struct xenpmu_params xp;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+ if (!strncmp(buffer, pmu_modes[i].name,
+ strlen(pmu_modes[i].name))) {
+ xp.control = pmu_modes[i].mode;
+ break;
+ }
+ }
+
+ if (i == ARRAY_SIZE(pmu_modes))
+ return -EINVAL;
+
+ ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, &xp);
+ if (ret)
+ return ret;
+
+ return len;
+}
+
+static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret;
+ struct xenpmu_params xp;
+ int i;
+ uint32_t mode;
+
+ ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, &xp);
+ if (ret)
+ return ret;
+
+ mode = (uint32_t)xp.control & VPMU_MODE_MASK;
+ for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+ if (mode == pmu_modes[i].mode)
+ return sprintf(buffer, "%s\n", pmu_modes[i].name);
+ }
+
+ return -EINVAL;
+}
+HYPERVISOR_ATTR_RW(pmu_mode);
+
+
+static ssize_t pmu_flags_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+ int ret;
+ uint32_t flags;
+ struct xenpmu_params xp;
+
+ ret = kstrtou32(buffer, 0, &flags);
+ if (ret)
+ return ret;
+
+ xp.control = flags;
+ ret = HYPERVISOR_xenpmu_op(XENPMU_flags_set, &xp);
+ if (ret)
+ return ret;
+
+ return len;
+}
+
+static ssize_t pmu_flags_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret;
+ struct xenpmu_params xp;
+
+ ret = HYPERVISOR_xenpmu_op(XENPMU_flags_get, &xp);
+ if (ret)
+ return ret;
+
+ return sprintf(buffer, "0x%x\n", (uint32_t)xp.control);
+}
+HYPERVISOR_ATTR_RW(pmu_flags);
+
+static struct attribute *xen_pmu_attrs[] = {
+ &pmu_mode_attr.attr,
+ &pmu_flags_attr.attr,
+ NULL
+};
+
+static const struct attribute_group xen_pmu_group = {
+ .name = "pmu",
+ .attrs = xen_pmu_attrs,
+};
+
+static int __init xen_pmu_init(void)
+{
+ return sysfs_create_group(hypervisor_kobj, &xen_pmu_group);
+}
+
+static void xen_pmu_destroy(void)
+{
+ sysfs_remove_group(hypervisor_kobj, &xen_pmu_group);
+}
+
static int __init hyper_sysfs_init(void)
{
int ret;
@@ -390,9 +500,16 @@ static int __init hyper_sysfs_init(void)
ret = xen_properties_init();
if (ret)
goto prop_out;
+ if (xen_initial_domain()) {
+ ret = xen_pmu_init();
+ if (ret)
+ goto pmu_out;
+ }

goto out;

+pmu_out:
+ xen_properties_destroy();
prop_out:
xen_sysfs_uuid_destroy();
uuid_out:
@@ -407,6 +524,7 @@ out:

static void __exit hyper_sysfs_exit(void)
{
+ xen_pmu_destroy();
xen_properties_destroy();
xen_compilation_destroy();
xen_sysfs_uuid_destroy();
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 53ec416..c29d427 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -58,6 +58,7 @@
#define __HYPERVISOR_physdev_op 33
#define __HYPERVISOR_hvm_op 34
#define __HYPERVISOR_tmem_op 38
+#define __HYPERVISOR_xenpmu_op 40

/* Architecture-specific hypercall definitions. */
#define __HYPERVISOR_arch_0 48
diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
new file mode 100644
index 0000000..63c42b1
--- /dev/null
+++ b/include/xen/interface/xenpmu.h
@@ -0,0 +1,44 @@
+#ifndef __XEN_PUBLIC_XENPMU_H__
+#define __XEN_PUBLIC_XENPMU_H__
+
+#include <asm/msr.h>
+
+#include "xen.h"
+
+#define XENPMU_VER_MAJ 0
+#define XENPMU_VER_MIN 0
+
+/* HYPERVISOR_xenpmu_op commands */
+#define XENPMU_mode_get 0
+#define XENPMU_mode_set 1
+#define XENPMU_flags_get 2
+#define XENPMU_flags_set 3
+
+/* Parameter structure for HYPERVISOR_xenpmu_op call */
+struct xenpmu_params {
+ union {
+ struct version {
+ uint8_t maj;
+ uint8_t min;
+ } version;
+ uint64_t pad;
+ };
+ uint64_t control;
+};
+
+/* VPMU modes */
+#define VPMU_MODE_MASK 0xff
+#define VPMU_OFF 0
+/* guests can profile themselves, (dom0 profiles itself and Xen) */
+#define VPMU_ON (1<<0)
+/*
+ * Only dom0 has access to VPMU and it profiles everyone: itself,
+ * the hypervisor and the guests.
+ */
+#define VPMU_PRIV (1<<1)
+
+/* VPMU flags */
+#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK))
+#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */
+
+#endif /* __XEN_PUBLIC_XENPMU_H__ */
--
1.8.1.4

2013-09-10 15:30:17

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 5/5] xen/PMU: Cache MSR accesses during interrupt handling

Avoid trapping to hypervisor on each MSR access during interrupt handling.
Instead, use cached MSR values provided in shared xenpmu_data by Xen. When
handling is completed, flush the registers to hypervisor who will load them
into HW.

Signed-off-by: Boris Ostrovsky <[email protected]>
---
arch/x86/xen/pmu.c | 15 ++++++++++++++-
include/xen/interface/xenpmu.h | 1 +
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index d8b059b..bdd7c4a 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -345,14 +345,27 @@ static void xen_convert_regs(struct cpu_user_regs *xen_regs,

irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
{
- int ret = IRQ_NONE;
+ int err, ret = IRQ_NONE;
struct pt_regs regs;
struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
smp_processor_id());

+ /*
+ * While handling the interrupt MSR accesses will be cached
+ * in PMU context
+ */
+ xenpmu_data->pmu_flags |= PMU_CACHED;
xen_convert_regs(&xenpmu_data->regs, &regs);
if (x86_pmu.handle_irq(&regs))
ret = IRQ_HANDLED;
+ xenpmu_data->pmu_flags &= ~PMU_CACHED;
+
+ /* Write out cached context to HW */
+ err = HYPERVISOR_xenpmu_op(XENPMU_flush, NULL);
+ if (err) {
+ WARN(1, "%s failed hypercall, err: %d\n", __func__, err);
+ return IRQ_NONE;
+ }

return ret;
}
diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
index 16fe1ab..ec0e802 100644
--- a/include/xen/interface/xenpmu.h
+++ b/include/xen/interface/xenpmu.h
@@ -16,6 +16,7 @@
#define XENPMU_init 4
#define XENPMU_finish 5
#define XENPMU_lvtpc_set 6
+#define XENPMU_flush 7 /* Write cached MSR values to HW */

/* Parameter structure for HYPERVISOR_xenpmu_op call */
struct xenpmu_params {
--
1.8.1.4

2013-09-10 15:31:00

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 3/5] xen/PMU: Initialization code for Xen PMU

Map shared data structure that will hold CPU registers, VPMU context,
VCPU/PCPI IDs of the VCPU interrupted by PMU interrupt. Hypervisor
fills this information in its handler and passes it to the guest for
further processing.

Set up PMU VIRQ.

Signed-off-by: Boris Ostrovsky <[email protected]>
---
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/pmu.c | 122 +++++++++++++++++++++++++++++++++++++++++
arch/x86/xen/pmu.h | 12 ++++
arch/x86/xen/smp.c | 31 ++++++++++-
include/xen/interface/xen.h | 1 +
include/xen/interface/xenpmu.h | 77 ++++++++++++++++++++++++++
6 files changed, 243 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/xen/pmu.c
create mode 100644 arch/x86/xen/pmu.h

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..b187df5 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
- p2m.o
+ p2m.o pmu.o

obj-$(CONFIG_EVENT_TRACING) += trace.o

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
new file mode 100644
index 0000000..da061d4
--- /dev/null
+++ b/arch/x86/xen/pmu.c
@@ -0,0 +1,122 @@
+#include <linux/types.h>
+#include <linux/interrupt.h>
+
+#include <asm/xen/hypercall.h>
+#include <xen/page.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/vcpu.h>
+#include <xen/interface/xenpmu.h>
+
+#include "xen-ops.h"
+#include "pmu.h"
+
+/* x86_pmu.handle_irq definition */
+#include <../kernel/cpu/perf_event.h>
+
+
+/* Shared page between hypervisor and domain */
+DEFINE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
+
+/* perf callbacks*/
+int xen_is_in_guest(void)
+{
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+
+ if (!xen_initial_domain() ||
+ xenpmu_data->domain_id > DOMID_SELF || xenpmu_data->domain_id == 0)
+ return 0;
+
+ return 1;
+}
+
+static int xen_is_user_mode(void)
+{
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+ return ((xenpmu_data->regs.cs & 3) == 3);
+}
+
+static unsigned long xen_get_guest_ip(void)
+{
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+ return xenpmu_data->regs.eip;
+}
+
+static struct perf_guest_info_callbacks xen_guest_cbs = {
+ .is_in_guest = xen_is_in_guest,
+ .is_user_mode = xen_is_user_mode,
+ .get_guest_ip = xen_get_guest_ip,
+};
+
+/* Convert registers from Xen's format to Linux' */
+static void xen_convert_regs(struct cpu_user_regs *xen_regs,
+ struct pt_regs *regs)
+{
+ regs->ip = xen_regs->eip;
+ regs->cs = xen_regs->cs;
+}
+
+irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
+{
+ int ret = IRQ_NONE;
+ struct pt_regs regs;
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+
+ xen_convert_regs(&xenpmu_data->regs, &regs);
+ if (x86_pmu.handle_irq(&regs))
+ ret = IRQ_HANDLED;
+
+ return ret;
+}
+
+int xen_pmu_init(int cpu)
+{
+ int ret = 0;
+ struct xenpmu_params xp;
+ unsigned long pfn;
+ struct xenpmu_data *xenpmu_data;
+
+ BUILD_BUG_ON(sizeof(struct xenpmu_data) > PAGE_SIZE);
+ xenpmu_data = vmalloc(PAGE_SIZE);
+ if (!xenpmu_data) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+ pfn = vmalloc_to_pfn((char *)xenpmu_data);
+
+ xp.mfn = pfn_to_mfn(pfn);
+ xp.vcpu = cpu;
+ xp.version.maj = XENPMU_VER_MAJ;
+ xp.version.min = XENPMU_VER_MIN;
+ ret = HYPERVISOR_xenpmu_op(XENPMU_init, &xp);
+ if (ret)
+ goto fail;
+
+ per_cpu(xenpmu_shared, cpu) = xenpmu_data;
+
+ if (cpu == 0)
+ perf_register_guest_info_callbacks(&xen_guest_cbs);
+
+ return ret;
+
+fail:
+ vfree(xenpmu_data);
+ return ret;
+}
+
+void xen_pmu_finish(int cpu)
+{
+ struct xenpmu_params xp;
+
+ xp.vcpu = cpu;
+ xp.version.maj = XENPMU_VER_MAJ;
+ xp.version.min = XENPMU_VER_MIN;
+
+ (void)HYPERVISOR_xenpmu_op(XENPMU_finish, &xp);
+
+ vfree(per_cpu(xenpmu_shared, cpu));
+ per_cpu(xenpmu_shared, cpu) = NULL;
+}
diff --git a/arch/x86/xen/pmu.h b/arch/x86/xen/pmu.h
new file mode 100644
index 0000000..51de7d2
--- /dev/null
+++ b/arch/x86/xen/pmu.h
@@ -0,0 +1,12 @@
+#ifndef __XEN_PMU_H
+#define __XEN_PMU_H
+
+#include <xen/interface/xenpmu.h>
+
+irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id);
+int xen_pmu_init(int cpu);
+void xen_pmu_finish(int cpu);
+
+DECLARE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
+
+#endif /* __XEN_PMU_H */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..17a88d1 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -26,6 +26,7 @@

#include <xen/interface/xen.h>
#include <xen/interface/vcpu.h>
+#include <xen/interface/xenpmu.h>

#include <asm/xen/interface.h>
#include <asm/xen/hypercall.h>
@@ -37,6 +38,7 @@
#include <xen/hvc-console.h>
#include "xen-ops.h"
#include "mmu.h"
+#include "pmu.h"

cpumask_var_t xen_cpu_initialized_map;

@@ -49,6 +51,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_callfunc_irq) = { .irq = -1 };
static DEFINE_PER_CPU(struct xen_common_irq, xen_callfuncsingle_irq) = { .irq = -1 };
static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work) = { .irq = -1 };
static DEFINE_PER_CPU(struct xen_common_irq, xen_debug_irq) = { .irq = -1 };
+static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };

static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id);
static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
@@ -139,11 +142,18 @@ static void xen_smp_intr_free(unsigned int cpu)
kfree(per_cpu(xen_irq_work, cpu).name);
per_cpu(xen_irq_work, cpu).name = NULL;
}
+
+ if (per_cpu(xen_pmu_irq, cpu).irq >= 0) {
+ unbind_from_irqhandler(per_cpu(xen_pmu_irq, cpu).irq, NULL);
+ per_cpu(xen_pmu_irq, cpu).irq = -1;
+ kfree(per_cpu(xen_pmu_irq, cpu).name);
+ per_cpu(xen_pmu_irq, cpu).name = NULL;
+ }
};
static int xen_smp_intr_init(unsigned int cpu)
{
int rc;
- char *resched_name, *callfunc_name, *debug_name;
+ char *resched_name, *callfunc_name, *debug_name, *pmu_name;

resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
@@ -209,6 +219,18 @@ static int xen_smp_intr_init(unsigned int cpu)
per_cpu(xen_irq_work, cpu).irq = rc;
per_cpu(xen_irq_work, cpu).name = callfunc_name;

+ if (per_cpu(xenpmu_shared, cpu)) {
+ pmu_name = kasprintf(GFP_KERNEL, "pmu%d", cpu);
+ rc = bind_virq_to_irqhandler(VIRQ_XENPMU, cpu,
+ xen_pmu_irq_handler,
+ IRQF_PERCPU|IRQF_NOBALANCING,
+ pmu_name, NULL);
+ if (rc < 0)
+ goto fail;
+ per_cpu(xen_pmu_irq, cpu).irq = rc;
+ per_cpu(xen_pmu_irq, cpu).name = pmu_name;
+ }
+
return 0;

fail:
@@ -307,6 +329,9 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
}
set_cpu_sibling_map(0);

+ if (xen_pmu_init(0))
+ pr_err("Could not initialize VPMU for VCPU 0\n");
+
if (xen_smp_intr_init(0))
BUG();

@@ -427,6 +452,9 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
/* Just in case we booted with a single CPU. */
alternatives_enable_smp();

+ if (xen_pmu_init(cpu))
+ pr_err("Could not initialize VPMU for VCPU %u\n", cpu);
+
rc = xen_smp_intr_init(cpu);
if (rc)
return rc;
@@ -468,6 +496,7 @@ static void xen_cpu_die(unsigned int cpu)
xen_smp_intr_free(cpu);
xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
+ xen_pmu_finish(cpu);
}

static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index c29d427..74eb6f3 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -81,6 +81,7 @@
#define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */
#define VIRQ_DEBUGGER 6 /* (DOM0) A domain has paused for debugging. */
#define VIRQ_PCPU_STATE 9 /* (DOM0) PCPU state changed */
+#define VIRQ_XENPMU 13 /* PMC interrupt */

/* Architecture-specific VIRQ definitions. */
#define VIRQ_ARCH_0 16
diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
index 63c42b1..7af682d 100644
--- a/include/xen/interface/xenpmu.h
+++ b/include/xen/interface/xenpmu.h
@@ -13,6 +13,8 @@
#define XENPMU_mode_set 1
#define XENPMU_flags_get 2
#define XENPMU_flags_set 3
+#define XENPMU_init 4
+#define XENPMU_finish 5

/* Parameter structure for HYPERVISOR_xenpmu_op call */
struct xenpmu_params {
@@ -24,6 +26,8 @@ struct xenpmu_params {
uint64_t pad;
};
uint64_t control;
+ uint64_t mfn;
+ uint64_t vcpu;
};

/* VPMU modes */
@@ -41,4 +45,77 @@ struct xenpmu_params {
#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK))
#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */

+
+/* AMD PMU registers and structures */
+#define F10H_NUM_COUNTERS 4
+#define F15H_NUM_COUNTERS 6
+/* To accommodate more countes in the future (e.g. NB counters) */
+#define MAX_NUM_COUNTERS 16
+struct amd_vpmu_context {
+ uint64_t counters[MAX_NUM_COUNTERS];
+ uint64_t ctrls[MAX_NUM_COUNTERS];
+ uint8_t msr_bitmap_set;
+};
+
+
+/* Intel PMU registers and structures */
+static const uint32_t core2_fix_counters_msr[] = {
+ MSR_CORE_PERF_FIXED_CTR0,
+ MSR_CORE_PERF_FIXED_CTR1,
+ MSR_CORE_PERF_FIXED_CTR2
+};
+#define VPMU_CORE2_NUM_FIXED (sizeof(core2_fix_counters_msr) / sizeof(uint32_t))
+
+/* Core 2 Non-architectual Performance Control MSRs. */
+static const uint32_t core2_ctrls_msr[] = {
+ MSR_CORE_PERF_FIXED_CTR_CTRL,
+ MSR_IA32_PEBS_ENABLE,
+ MSR_IA32_DS_AREA
+};
+#define VPMU_CORE2_NUM_CTRLS (sizeof(core2_ctrls_msr) / sizeof(uint32_t))
+
+#define VPMU_CORE2_MAX_ARCH_PMCS 16
+struct core2_pmu_enable {
+ char ds_area_enable;
+ char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
+ char arch_pmc_enable[VPMU_CORE2_MAX_ARCH_PMCS];
+};
+
+struct arch_msr_pair {
+ uint64_t counter;
+ uint64_t control;
+};
+struct core2_vpmu_context {
+ uint64_t global_ctrl;
+ uint64_t global_ovf_ctrl;
+ uint64_t global_status;
+ uint64_t global_ovf_status;
+ uint64_t fix_counters[VPMU_CORE2_NUM_FIXED];
+ uint64_t ctrls[VPMU_CORE2_NUM_CTRLS];
+ struct arch_msr_pair arch_msr_pair[VPMU_CORE2_MAX_ARCH_PMCS];
+ struct core2_pmu_enable *pmu_enable;
+};
+
+/* Shared between hypervisor and PV domain */
+struct xenpmu_data {
+ union {
+ struct cpu_user_regs regs;
+ uint8_t pad[256];
+ };
+ uint32_t domain_id;
+ uint32_t vcpu_id;
+ uint32_t pcpu_id;
+ uint32_t pmu_flags;
+ union {
+ struct amd_vpmu_context amd;
+ struct core2_vpmu_context intel;
+#define MAX(x, y) ((x) > (y) ? (x) : (y))
+#define MAX_CTXT_SZ MAX(sizeof(struct amd_vpmu_context),\
+ sizeof(struct core2_vpmu_context))
+#define PMU_PAD_SIZE (((MAX_CTXT_SZ + 64) & ~63) + 128)
+ uint8_t pad[PMU_PAD_SIZE]; /* a bit more than necessary */
+ } pmu;
+};
+
+
#endif /* __XEN_PUBLIC_XENPMU_H__ */
--
1.8.1.4

2013-09-10 15:32:28

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 4/5] xen/PMU: Add support for PMU registes on PV guests

PMU emulation code: MSR caching in PMU context and LVTPC APIC
handling. (Portions of this code are taken from Xen's VPMU
implementation)

Signed-off-by: Boris Ostrovsky <[email protected]>
---
arch/x86/xen/enlighten.c | 27 +++-
arch/x86/xen/pmu.c | 289 ++++++++++++++++++++++++++++++++++++++++-
arch/x86/xen/pmu.h | 3 +
include/xen/interface/xenpmu.h | 5 +
4 files changed, 317 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 193097e..2512bd3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -82,6 +82,7 @@
#include "mmu.h"
#include "smp.h"
#include "multicalls.h"
+#include "pmu.h"

EXPORT_SYMBOL_GPL(hypercall_page);

@@ -960,6 +961,11 @@ static u32 xen_apic_read(u32 reg)

static void xen_apic_write(u32 reg, u32 val)
{
+ if (reg == APIC_LVTPC) {
+ (void)pmu_apic_update(reg);
+ return;
+ }
+
/* Warn to see if there's any stray references */
WARN_ON(1);
}
@@ -1064,11 +1070,20 @@ static inline void xen_write_cr8(unsigned long val)
BUG_ON(val);
}
#endif
-static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
+
+static u64 xen_read_msr_safe(unsigned int msr, int *err)
{
- int ret;
+ u64 val;

- ret = 0;
+ if (pmu_msr_read(msr, &val, err))
+ return val;
+
+ return native_read_msr_safe(msr, err);
+}
+
+static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
+{
+ int ret = 0;

switch (msr) {
#ifdef CONFIG_X86_64
@@ -1102,10 +1117,10 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
if (smp_processor_id() == 0)
xen_set_pat(((u64)high << 32) | low);
break;
+ }

- default:
+ if (!pmu_msr_write(msr, low, high, &ret))
ret = native_write_msr_safe(msr, low, high);
- }

return ret;
}
@@ -1239,7 +1254,7 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {

.wbinvd = native_wbinvd,

- .read_msr = native_read_msr_safe,
+ .read_msr = xen_read_msr_safe,
.write_msr = xen_write_msr_safe,

.read_tsc = native_read_tsc,
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index da061d4..d8b059b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -17,6 +17,291 @@
/* Shared page between hypervisor and domain */
DEFINE_PER_CPU(struct xenpmu_data *, xenpmu_shared);

+/* PMU register caching */
+
+/* AMD PMU */
+static __read_mostly uint32_t amd_counters_base;
+static __read_mostly uint32_t amd_ctrls_base;
+static __read_mostly int amd_msr_step;
+static __read_mostly int k7_counters_mirrored;
+static __read_mostly int amd_num_counters;
+
+/* Intel PMU */
+#define MSR_TYPE_COUNTER 0
+#define MSR_TYPE_CTRL 1
+#define MSR_TYPE_GLOBAL 2
+#define MSR_TYPE_ARCH_COUNTER 3
+#define MSR_TYPE_ARCH_CTRL 4
+
+#define PMU_GENERAL_NR_SHIFT 8 /* Number of general pmu registers */
+#define PMU_GENERAL_NR_BITS 8 /* 8 bits 8..15 */
+#define PMU_GENERAL_NR_MASK (((1 << PMU_GENERAL_NR_BITS) - 1) \
+ << PMU_GENERAL_NR_SHIFT)
+
+static __read_mostly int intel_num_counters;
+
+
+static void xen_pmu_arch_init(void)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+
+ switch (boot_cpu_data.x86) {
+ case 0x15:
+ amd_num_counters = F15H_NUM_COUNTERS;
+ amd_counters_base = MSR_F15H_PERF_CTR;
+ amd_ctrls_base = MSR_F15H_PERF_CTL;
+ amd_msr_step = 2;
+ k7_counters_mirrored = 1;
+ break;
+ case 0x10:
+ case 0x12:
+ case 0x14:
+ case 0x16:
+ default:
+ amd_num_counters = F10H_NUM_COUNTERS;
+ amd_counters_base = MSR_K7_PERFCTR0;
+ amd_ctrls_base = MSR_K7_EVNTSEL0;
+ amd_msr_step = 1;
+ k7_counters_mirrored = 0;
+ break;
+ }
+ } else {
+ uint32_t eax = cpuid_eax(0xa);
+
+ intel_num_counters = (eax & PMU_GENERAL_NR_MASK) >>
+ PMU_GENERAL_NR_SHIFT;
+ }
+}
+
+static inline uint32_t get_fam15h_addr(u32 addr)
+{
+ switch (addr) {
+ case MSR_K7_PERFCTR0:
+ case MSR_K7_PERFCTR1:
+ case MSR_K7_PERFCTR2:
+ case MSR_K7_PERFCTR3:
+ return MSR_F15H_PERF_CTR + (addr - MSR_K7_PERFCTR0);
+ case MSR_K7_EVNTSEL0:
+ case MSR_K7_EVNTSEL1:
+ case MSR_K7_EVNTSEL2:
+ case MSR_K7_EVNTSEL3:
+ return MSR_F15H_PERF_CTL + (addr - MSR_K7_EVNTSEL0);
+ default:
+ break;
+ }
+
+ return addr;
+}
+
+static inline bool is_amd_pmu_msr(unsigned int msr)
+{
+ if ((msr < MSR_F15H_PERF_CTL ||
+ msr > MSR_F15H_PERF_CTR + amd_num_counters) &&
+ (msr < MSR_K7_EVNTSEL0 ||
+ msr > MSR_K7_PERFCTR0 + amd_num_counters))
+ return true;
+
+ return false;
+}
+
+static bool is_core2_pmu_msr(u32 msr_index, int *type, int *index)
+{
+ int i;
+
+ for (i = 0; i < VPMU_CORE2_NUM_FIXED; i++) {
+ if (core2_fix_counters_msr[i] == msr_index) {
+ *type = MSR_TYPE_COUNTER;
+ *index = i;
+ return true;
+ }
+ }
+
+ for (i = 0; i < VPMU_CORE2_NUM_CTRLS; i++) {
+ if (core2_ctrls_msr[i] == msr_index) {
+ *type = MSR_TYPE_CTRL;
+ *index = i;
+ return true;
+ }
+ }
+
+ if ((msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
+ (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
+ (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL)) {
+ *type = MSR_TYPE_GLOBAL;
+ return true;
+ }
+
+ if ((msr_index >= MSR_IA32_PERFCTR0) &&
+ (msr_index < (MSR_IA32_PERFCTR0 + intel_num_counters))) {
+ *type = MSR_TYPE_ARCH_COUNTER;
+ *index = msr_index - MSR_IA32_PERFCTR0;
+ return true;
+ }
+
+ if ((msr_index >= MSR_P6_EVNTSEL0) &&
+ (msr_index < (MSR_P6_EVNTSEL0 + intel_num_counters))) {
+ *type = MSR_TYPE_ARCH_CTRL;
+ *index = msr_index - MSR_P6_EVNTSEL0;
+ return true;
+ }
+
+ return false;
+}
+
+int xen_intel_pmu_rw(unsigned int msr, u64 *val, int type,
+ int index, bool is_read)
+{
+ uint64_t *reg = NULL;
+ struct core2_vpmu_context *ctxt;
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+
+ if (!xenpmu_data)
+ return 1;
+
+ if (!(xenpmu_data->pmu_flags & PMU_CACHED)) /* No caching needed */
+ return 1;
+
+ ctxt = &xenpmu_data->pmu.intel;
+
+ switch (msr) {
+ case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ reg = &ctxt->global_ovf_ctrl;
+ break;
+ case MSR_CORE_PERF_GLOBAL_STATUS:
+ reg = &ctxt->global_status;
+ break;
+ case MSR_CORE_PERF_GLOBAL_CTRL:
+ reg = &ctxt->global_ctrl;
+ break;
+ case MSR_CORE_PERF_FIXED_CTR_CTRL:
+ reg = &ctxt->ctrls[0];
+ break;
+ default:
+ switch (type) {
+ case MSR_TYPE_COUNTER:
+ reg = &ctxt->fix_counters[index];
+ break;
+ case MSR_TYPE_ARCH_COUNTER:
+ reg = &ctxt->arch_msr_pair[index].counter;
+ break;
+ case MSR_TYPE_ARCH_CTRL:
+ reg = &ctxt->arch_msr_pair[index].control;
+ break;
+ default:
+ return 1;
+ }
+ }
+
+ if (reg) {
+ if (is_read)
+ *val = *reg;
+ else {
+ *reg = *val;
+
+ if (msr == MSR_CORE_PERF_GLOBAL_OVF_CTRL)
+ ctxt->global_status &= (~(*val));
+ }
+ return 0;
+ }
+
+ return 1;
+}
+
+int xen_amd_pmu_rw(unsigned int msr, u64 *val, bool is_read)
+{
+ uint64_t *reg = NULL;
+ int i, off = 0;
+ struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
+ smp_processor_id());
+
+ if (!xenpmu_data)
+ return 1;
+
+ if (!(xenpmu_data->pmu_flags & PMU_CACHED)) /* No caching needed */
+ return 1;
+
+ if (k7_counters_mirrored &&
+ ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)))
+ msr = get_fam15h_addr(msr);
+
+ for (i = 0; i < amd_num_counters; i++) {
+ if (msr == amd_ctrls_base + off) {
+ reg = &xenpmu_data->pmu.amd.ctrls[i];
+ break;
+ } else if (msr == amd_counters_base + off) {
+ reg = &xenpmu_data->pmu.amd.counters[i];
+ break;
+ }
+ off += amd_msr_step;
+ }
+
+ if (reg) {
+ if (is_read)
+ *val = *reg;
+ else
+ *reg = *val;
+ return 0;
+ }
+ return 1;
+}
+
+int pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+ if (is_amd_pmu_msr(msr)) {
+ if (xen_amd_pmu_rw(msr, val, true))
+ *val = native_read_msr_safe(msr, err);
+ return 1;
+ }
+ } else {
+ int type, index;
+ if (is_core2_pmu_msr(msr, &type, &index)) {
+ if (xen_intel_pmu_rw(msr, val, type, index, true))
+ *val = native_read_msr_safe(msr, err);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+int pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
+{
+ uint64_t val = ((uint64_t)high << 32) | low;
+
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+ if (is_amd_pmu_msr(msr)) {
+ if (xen_amd_pmu_rw(msr, &val, false))
+ *err = native_write_msr_safe(msr, low, high);
+ return 1;
+ }
+ } else {
+ int type, index;
+
+ if (is_core2_pmu_msr(msr, &type, &index)) {
+ if (xen_intel_pmu_rw(msr, &val, type, index, false))
+ *err = native_write_msr_safe(msr, low, high);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+int pmu_apic_update(uint64_t reg)
+{
+ int ret;
+ struct xenpmu_params xp;
+
+ xp.lvtpc = reg;
+ xp.version.maj = XENPMU_VER_MAJ;
+ xp.version.min = XENPMU_VER_MIN;
+ ret = HYPERVISOR_xenpmu_op(XENPMU_lvtpc_set, &xp);
+
+ return ret;
+}
+
/* perf callbacks*/
int xen_is_in_guest(void)
{
@@ -97,8 +382,10 @@ int xen_pmu_init(int cpu)

per_cpu(xenpmu_shared, cpu) = xenpmu_data;

- if (cpu == 0)
+ if (cpu == 0) {
perf_register_guest_info_callbacks(&xen_guest_cbs);
+ xen_pmu_arch_init();
+ }

return ret;

diff --git a/arch/x86/xen/pmu.h b/arch/x86/xen/pmu.h
index 51de7d2..adc2b5e 100644
--- a/arch/x86/xen/pmu.h
+++ b/arch/x86/xen/pmu.h
@@ -6,6 +6,9 @@
irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id);
int xen_pmu_init(int cpu);
void xen_pmu_finish(int cpu);
+int pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
+int pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
+int pmu_apic_update(uint64_t reg);

DECLARE_PER_CPU(struct xenpmu_data *, xenpmu_shared);

diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
index 7af682d..16fe1ab 100644
--- a/include/xen/interface/xenpmu.h
+++ b/include/xen/interface/xenpmu.h
@@ -15,6 +15,7 @@
#define XENPMU_flags_set 3
#define XENPMU_init 4
#define XENPMU_finish 5
+#define XENPMU_lvtpc_set 6

/* Parameter structure for HYPERVISOR_xenpmu_op call */
struct xenpmu_params {
@@ -28,6 +29,7 @@ struct xenpmu_params {
uint64_t control;
uint64_t mfn;
uint64_t vcpu;
+ uint64_t lvtpc;
};

/* VPMU modes */
@@ -96,6 +98,9 @@ struct core2_vpmu_context {
struct core2_pmu_enable *pmu_enable;
};

+/* PMU flags */
+#define PMU_CACHED 1
+
/* Shared between hypervisor and PV domain */
struct xenpmu_data {
union {
--
1.8.1.4

2013-09-10 15:30:11

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH v1 1/5] xen: xensyms support

Export Xen symbols to dom0 via /proc/xen/xensyms (similar to /proc/kallsyms).

Signed-off-by: Boris Ostrovsky <[email protected]>
---
drivers/xen/Kconfig | 5 ++
drivers/xen/xenfs/Makefile | 1 +
drivers/xen/xenfs/super.c | 3 +
drivers/xen/xenfs/xenfs.h | 1 +
drivers/xen/xenfs/xensyms.c | 170 +++++++++++++++++++++++++++++++++++++++
include/xen/interface/platform.h | 21 +++++
6 files changed, 201 insertions(+)
create mode 100644 drivers/xen/xenfs/xensyms.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 9e02d60..a21fce9 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -242,4 +242,9 @@ config XEN_MCE_LOG
config XEN_HAVE_PVMMU
bool

+config XEN_SYMS
+ bool "Xen symbols"
+ depends on XEN_DOM0 && XENFS && KALLSYMS
+ default y
+
endmenu
diff --git a/drivers/xen/xenfs/Makefile b/drivers/xen/xenfs/Makefile
index b019865..1a83010 100644
--- a/drivers/xen/xenfs/Makefile
+++ b/drivers/xen/xenfs/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_XENFS) += xenfs.o

xenfs-y = super.o
xenfs-$(CONFIG_XEN_DOM0) += xenstored.o
+xenfs-$(CONFIG_XEN_SYMS) += xensyms.o
diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 06092e0..8559a71 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -57,6 +57,9 @@ static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
{ "privcmd", &xen_privcmd_fops, S_IRUSR|S_IWUSR },
{ "xsd_kva", &xsd_kva_file_ops, S_IRUSR|S_IWUSR},
{ "xsd_port", &xsd_port_file_ops, S_IRUSR|S_IWUSR},
+#ifdef CONFIG_XEN_SYMS
+ { "xensyms", &xensyms_ops, S_IRUSR},
+#endif
{""},
};

diff --git a/drivers/xen/xenfs/xenfs.h b/drivers/xen/xenfs/xenfs.h
index 6b80c77..2c5934e 100644
--- a/drivers/xen/xenfs/xenfs.h
+++ b/drivers/xen/xenfs/xenfs.h
@@ -3,5 +3,6 @@

extern const struct file_operations xsd_kva_file_ops;
extern const struct file_operations xsd_port_file_ops;
+extern const struct file_operations xensyms_ops;

#endif /* _XENFS_XENBUS_H */
diff --git a/drivers/xen/xenfs/xensyms.c b/drivers/xen/xenfs/xensyms.c
new file mode 100644
index 0000000..e3301f0
--- /dev/null
+++ b/drivers/xen/xenfs/xensyms.c
@@ -0,0 +1,170 @@
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+#include <linux/slab.h>
+#include <xen/interface/platform.h>
+#include <asm/xen/hypercall.h>
+#include <xen/xen-ops.h>
+#include "xenfs.h"
+
+
+struct xensyms_state {
+ struct xenpf_symdata symdata;
+ ssize_t off;
+};
+
+/* Grab next output page from the hypervisor */
+static int xensyms_next_page(struct xensyms_state *state)
+{
+ int ret;
+ struct xen_platform_op op;
+
+ op.cmd = XENPF_get_symbols;
+ op.u.symdata = state->symdata;
+
+ ret = HYPERVISOR_dom0_op(&op);
+ if (ret < 0)
+ return ret;
+ if (ret == 0)
+ /* End of symbols */
+ return 1;
+
+ state->symdata = op.u.symdata;
+ state->off = 0;
+
+ return 0;
+}
+
+static void *xensyms_start(struct seq_file *m, loff_t *pos)
+{
+ struct xensyms_state *state = (struct xensyms_state *)m->private;
+
+ if (state->off == UINT_MAX) {
+ if (xensyms_next_page(state))
+ return NULL;
+ }
+
+ return m->private;
+}
+
+static void *xensyms_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ struct xensyms_state *state = (struct xensyms_state *)m->private;
+ unsigned int str_len, next_off;
+
+ str_len = strlen(&state->symdata.buf[state->off]) + 1;
+ next_off = state->off + str_len;
+
+ /* If there is no more data on this page then ask Xen for more */
+ if (next_off >= XENSYMS_SZ || state->symdata.buf[next_off] == 0) {
+ if (xensyms_next_page(state))
+ return NULL;
+ } else
+ state->off = next_off;
+
+ return p;
+}
+
+static int xensyms_show(struct seq_file *m, void *p)
+{
+ struct xensyms_state *state = (struct xensyms_state *)m->private;
+
+ seq_printf(m, "%s", &state->symdata.buf[state->off]);
+
+ return 0;
+}
+
+static void xensyms_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations xensyms_seq_ops = {
+ .start = xensyms_start,
+ .next = xensyms_next,
+ .show = xensyms_show,
+ .stop = xensyms_stop,
+};
+
+static int xensyms_open(struct inode *inode, struct file *file)
+{
+ int ret;
+ struct xensyms_state *state;
+ struct seq_file *m;
+
+ ret = seq_open_private(file, &xensyms_seq_ops,
+ sizeof(struct xensyms_state));
+ if (ret)
+ return ret;
+
+ m = file->private_data;
+ state = (struct xensyms_state *)m->private;
+ state->symdata.buf = kmalloc(XENSYMS_SZ, GFP_KERNEL);
+ if (!state->symdata.buf) {
+ (void)seq_release_private(inode, file);
+ return -ENOMEM;
+ }
+
+ /* Force first page fetch in xensyms_start() */
+ state->off = UINT_MAX;
+
+ state->symdata.xen_offset = 0;
+ state->symdata.xen_symnum = 0;
+
+ return 0;
+}
+
+static int xensyms_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *m = file->private_data;
+ struct xensyms_state *state = (struct xensyms_state *)m->private;
+
+ kfree(state->symdata.buf);
+ return seq_release_private(inode, file);
+}
+
+static loff_t xensyms_lseek(struct file *file, loff_t offset, int whence)
+{
+ loff_t ret;
+ struct seq_file *m = file->private_data;
+ struct xensyms_state *state = (struct xensyms_state *)m->private;
+
+ /*
+ * lseek is rather limited in what it can do here since the file
+ * is stored in compressed form. Typically it is only used to
+ * seek to the beginning of the file by 'more' and the like
+ * Getting it to work correctly maybe a TODO (broken for Linux
+ * kallsyms too)
+ */
+ if (whence != SEEK_SET)
+ return -EINVAL;
+
+ state->symdata.xen_offset = 0;
+ state->symdata.xen_symnum = 0;
+ state->off = UINT_MAX;
+
+ ret = seq_lseek(file, offset, whence);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * Advance pointers to next string (seq_read() will flush
+ * remainder of current string to user on next read).
+ * Note that we are guaranteed not to cross page boundary here
+ * since only whole strings are placed onto the page by Xen.
+ */
+ if (state->off != UINT_MAX && state->symdata.buf[state->off] != 0)
+ state->off += strlen(&state->symdata.buf[state->off]) + 1;
+
+ return ret;
+}
+
+
+const struct file_operations xensyms_ops = {
+ .open = xensyms_open,
+ .read = seq_read,
+ .llseek = xensyms_lseek,
+ .release = xensyms_release
+};
diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
index c57d5f6..ecaaa08a04 100644
--- a/include/xen/interface/platform.h
+++ b/include/xen/interface/platform.h
@@ -351,6 +351,26 @@ struct xenpf_core_parking {
};
DEFINE_GUEST_HANDLE_STRUCT(xenpf_core_parking);

+#define XENPF_get_symbols 61
+
+#define XENSYMS_SZ 4096
+struct xenpf_symdata {
+ /*
+ * offset into Xen's symbol data and symbol number from
+ * last call. Used only by Xen.
+ */
+ uint64_t xen_offset;
+ uint64_t xen_symnum;
+
+ /*
+ * Symbols data, formatted similar to /proc/kallsyms:
+ * <address> <type> <name>
+ */
+ GUEST_HANDLE(char) buf;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_symdata);
+
+
struct xen_platform_op {
uint32_t cmd;
uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
@@ -371,6 +391,7 @@ struct xen_platform_op {
struct xenpf_cpu_hotadd cpu_add;
struct xenpf_mem_hotadd mem_add;
struct xenpf_core_parking core_parking;
+ struct xenpf_symdata symdata;
uint8_t pad[128];
} u;
};
--
1.8.1.4

2013-09-11 08:17:36

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH v1 1/5] xen: xensyms support

>>> On 10.09.13 at 17:31, Boris Ostrovsky <[email protected]> wrote:
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -242,4 +242,9 @@ config XEN_MCE_LOG
> config XEN_HAVE_PVMMU
> bool
>
> +config XEN_SYMS
> + bool "Xen symbols"
> + depends on XEN_DOM0 && XENFS && KALLSYMS
> + default y

Looking through the rest of the patch I can't see the dependency
on KALLSYMS. Perhaps this should be used for the default setting
instead of as a dependency?

Jan

2013-09-11 09:33:53

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1 0/5] xen/PMU: PMU support for Xen PV guests

On 10/09/13 16:31, Boris Ostrovsky wrote:
> This is the Linux side of Xen PMU support for PV guests, including dom0. Only
> kernel changes are here, toolstack patch will be provided separately.
>
> Here is description from the hypervisor patch submission that applies to this
> series as well:
>
> This version has following limitations:
> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
> on them.
> * No backtrace support.

These are some pretty significant limitations. Is there a plan for how
to remove them?

David

2013-09-11 14:25:26

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1 0/5] xen/PMU: PMU support for Xen PV guests

On 09/11/2013 05:33 AM, David Vrabel wrote:
> On 10/09/13 16:31, Boris Ostrovsky wrote:
>> This is the Linux side of Xen PMU support for PV guests, including dom0. Only
>> kernel changes are here, toolstack patch will be provided separately.
>>
>> Here is description from the hypervisor patch submission that applies to this
>> series as well:
>>
>> This version has following limitations:
>> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
>> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
>> on them.
>> * No backtrace support.
> These are some pretty significant limitations. Is there a plan for how
> to remove them?

I don't have a specific plan (other than do it after this stage is
finished) but I do
have a rough idea of what would be needed to address these.

Hypervisor changes for all three should be pretty easy.

Linux-wise, for the first one (pinned VCPU) we will probably need to make a
change in perf_sample_data. There is a reserved filed there so perhaps we
can use it to store PCPU (and maybe domainID). Plus a way to actually write
the data, probably a hook or something.

Backtrace support should also not be too bad: we can pass Xen IP stack in
the shared data area to dom0 and then again have a hook or something to
pass it to perf. (Note that backtracing is not supported for KVM neither
so both
may benefit here)

The second one is the most difficult: We need to be able somehow to access
MSRs (and APIC, I think) on non-dom0 CPUs. Creating interface for such
access
is not a big deal but integrating it into perf infrastructure will be a
challenge. There
are other alternatives but they have problems as well.


-boris

2013-09-23 13:18:01

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] xen/PMU: Sysfs interface for setting Xen PMU mode

On Tue, Sep 10, 2013 at 11:31:47AM -0400, Boris Ostrovsky wrote:
> Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.
>
> Signed-off-by: Boris Ostrovsky <[email protected]>
> ---
> arch/x86/include/asm/xen/hypercall.h | 6 ++
> arch/x86/xen/xen-head.S | 5 +-
> drivers/xen/sys-hypervisor.c | 118 +++++++++++++++++++++++++++++++++++
> include/xen/interface/xen.h | 1 +
> include/xen/interface/xenpmu.h | 44 +++++++++++++
> 5 files changed, 173 insertions(+), 1 deletion(-)
> create mode 100644 include/xen/interface/xenpmu.h
>
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index e709884..f022cef 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
> return _hypercall1(int, tmem_op, op);
> }
>
> +static inline int
> +HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
> +{
> + return _hypercall2(int, xenpmu_op, op, arg);
> +}
> +
> static inline void
> MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
> {
> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
> index 7faed58..f2415e6 100644
> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -73,8 +73,11 @@ NEXT_HYPERCALL(sysctl)
> NEXT_HYPERCALL(domctl)
> NEXT_HYPERCALL(kexec_op)
> NEXT_HYPERCALL(tmem_op) /* 38 */
> +ENTRY(xenclient_rsvd)
> + .skip 32
> +NEXT_HYPERCALL(xenpmu_op) /* 40 */
> ENTRY(xen_hypercall_rsvr)
> - .skip 320
> + .skip 256
> NEXT_HYPERCALL(mca) /* 48 */
> NEXT_HYPERCALL(arch_1)
> NEXT_HYPERCALL(arch_2)
> diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
> index 96453f8..2218154 100644
> --- a/drivers/xen/sys-hypervisor.c
> +++ b/drivers/xen/sys-hypervisor.c
> @@ -20,6 +20,7 @@
> #include <xen/xenbus.h>
> #include <xen/interface/xen.h>
> #include <xen/interface/version.h>
> +#include <xen/interface/xenpmu.h>
>
> #define HYPERVISOR_ATTR_RO(_name) \
> static struct hyp_sysfs_attr _name##_attr = __ATTR_RO(_name)
> @@ -368,6 +369,115 @@ static void xen_properties_destroy(void)
> sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
> }
>
> +struct pmu_mode {
> + const char *name;
> + uint32_t mode;
> +};
> +struct pmu_mode pmu_modes[] = {
> + {"enable", VPMU_ON},
> + {"priv_enable", VPMU_PRIV},
> + {"disable", 0}
> +};
> +static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
> + const char *buffer, size_t len)
> +{
> + int ret;
> + struct xenpmu_params xp;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
> + if (!strncmp(buffer, pmu_modes[i].name,
> + strlen(pmu_modes[i].name))) {
> + xp.control = pmu_modes[i].mode;
> + break;
> + }
> + }
> +
> + if (i == ARRAY_SIZE(pmu_modes))
> + return -EINVAL;
> +
> + ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, &xp);
> + if (ret)
> + return ret;
> +
> + return len;
> +}
> +
> +static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer)
> +{
> + int ret;
> + struct xenpmu_params xp;
> + int i;
> + uint32_t mode;
> +
> + ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, &xp);
> + if (ret)
> + return ret;
> +
> + mode = (uint32_t)xp.control & VPMU_MODE_MASK;
> + for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
> + if (mode == pmu_modes[i].mode)
> + return sprintf(buffer, "%s\n", pmu_modes[i].name);
> + }
> +
> + return -EINVAL;
> +}
> +HYPERVISOR_ATTR_RW(pmu_mode);
> +
> +
> +static ssize_t pmu_flags_store(struct hyp_sysfs_attr *attr,
> + const char *buffer, size_t len)
> +{
> + int ret;
> + uint32_t flags;
> + struct xenpmu_params xp;
> +
> + ret = kstrtou32(buffer, 0, &flags);
> + if (ret)
> + return ret;
> +
> + xp.control = flags;
> + ret = HYPERVISOR_xenpmu_op(XENPMU_flags_set, &xp);
> + if (ret)
> + return ret;
> +
> + return len;
> +}
> +
> +static ssize_t pmu_flags_show(struct hyp_sysfs_attr *attr, char *buffer)
> +{
> + int ret;
> + struct xenpmu_params xp;
> +
> + ret = HYPERVISOR_xenpmu_op(XENPMU_flags_get, &xp);
> + if (ret)
> + return ret;
> +
> + return sprintf(buffer, "0x%x\n", (uint32_t)xp.control);
> +}
> +HYPERVISOR_ATTR_RW(pmu_flags);
> +
> +static struct attribute *xen_pmu_attrs[] = {
> + &pmu_mode_attr.attr,
> + &pmu_flags_attr.attr,
> + NULL
> +};
> +
> +static const struct attribute_group xen_pmu_group = {
> + .name = "pmu",
> + .attrs = xen_pmu_attrs,
> +};
> +
> +static int __init xen_pmu_init(void)
> +{
> + return sysfs_create_group(hypervisor_kobj, &xen_pmu_group);
> +}
> +
> +static void xen_pmu_destroy(void)
> +{
> + sysfs_remove_group(hypervisor_kobj, &xen_pmu_group);
> +}
> +
> static int __init hyper_sysfs_init(void)
> {
> int ret;
> @@ -390,9 +500,16 @@ static int __init hyper_sysfs_init(void)
> ret = xen_properties_init();
> if (ret)
> goto prop_out;
> + if (xen_initial_domain()) {
> + ret = xen_pmu_init();
> + if (ret)
> + goto pmu_out;
> + }
>
> goto out;
>
> +pmu_out:
> + xen_properties_destroy();
> prop_out:
> xen_sysfs_uuid_destroy();
> uuid_out:
> @@ -407,6 +524,7 @@ out:
>
> static void __exit hyper_sysfs_exit(void)
> {
> + xen_pmu_destroy();
> xen_properties_destroy();
> xen_compilation_destroy();
> xen_sysfs_uuid_destroy();
> diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
> index 53ec416..c29d427 100644
> --- a/include/xen/interface/xen.h
> +++ b/include/xen/interface/xen.h
> @@ -58,6 +58,7 @@
> #define __HYPERVISOR_physdev_op 33
> #define __HYPERVISOR_hvm_op 34
> #define __HYPERVISOR_tmem_op 38
> +#define __HYPERVISOR_xenpmu_op 40
>
> /* Architecture-specific hypercall definitions. */
> #define __HYPERVISOR_arch_0 48
> diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
> new file mode 100644
> index 0000000..63c42b1
> --- /dev/null
> +++ b/include/xen/interface/xenpmu.h
> @@ -0,0 +1,44 @@
> +#ifndef __XEN_PUBLIC_XENPMU_H__
> +#define __XEN_PUBLIC_XENPMU_H__
> +
> +#include <asm/msr.h>
> +
> +#include "xen.h"
> +
> +#define XENPMU_VER_MAJ 0
> +#define XENPMU_VER_MIN 0
> +
> +/* HYPERVISOR_xenpmu_op commands */
> +#define XENPMU_mode_get 0
> +#define XENPMU_mode_set 1
> +#define XENPMU_flags_get 2
> +#define XENPMU_flags_set 3
> +
> +/* Parameter structure for HYPERVISOR_xenpmu_op call */
> +struct xenpmu_params {
> + union {
> + struct version {
> + uint8_t maj;
> + uint8_t min;
> + } version;
> + uint64_t pad;
> + };
> + uint64_t control;
> +};
> +
> +/* VPMU modes */
> +#define VPMU_MODE_MASK 0xff
> +#define VPMU_OFF 0
> +/* guests can profile themselves, (dom0 profiles itself and Xen) */
> +#define VPMU_ON (1<<0)
> +/*
> + * Only dom0 has access to VPMU and it profiles everyone: itself,
> + * the hypervisor and the guests.
> + */
> +#define VPMU_PRIV (1<<1)
> +
> +/* VPMU flags */
> +#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK))
> +#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */
> +
> +#endif /* __XEN_PUBLIC_XENPMU_H__ */

Looks OK to me if there are no changes to the hypervisor ABI.
> --
> 1.8.1.4
>

2013-09-23 13:26:30

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] xen/PMU: Initialization code for Xen PMU

On Tue, Sep 10, 2013 at 11:31:48AM -0400, Boris Ostrovsky wrote:
> Map shared data structure that will hold CPU registers, VPMU context,
> VCPU/PCPI IDs of the VCPU interrupted by PMU interrupt. Hypervisor
> fills this information in its handler and passes it to the guest for
> further processing.
>
> Set up PMU VIRQ.
>
> Signed-off-by: Boris Ostrovsky <[email protected]>
> ---
> arch/x86/xen/Makefile | 2 +-
> arch/x86/xen/pmu.c | 122 +++++++++++++++++++++++++++++++++++++++++
> arch/x86/xen/pmu.h | 12 ++++
> arch/x86/xen/smp.c | 31 ++++++++++-
> include/xen/interface/xen.h | 1 +
> include/xen/interface/xenpmu.h | 77 ++++++++++++++++++++++++++
> 6 files changed, 243 insertions(+), 2 deletions(-)
> create mode 100644 arch/x86/xen/pmu.c
> create mode 100644 arch/x86/xen/pmu.h
>
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 96ab2c0..b187df5 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp)
> obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
> time.o xen-asm.o xen-asm_$(BITS).o \
> grant-table.o suspend.o platform-pci-unplug.o \
> - p2m.o
> + p2m.o pmu.o

Perhaps guard the build of this based on CONFIG_PERF_EVENTS?

That would of course mean you also have to create in xenpmu.h
static inline empy functions for xen_pmu_finish and xen_pmu_init in case
CONFIG_PERF_EVENTS is not set.


>
> obj-$(CONFIG_EVENT_TRACING) += trace.o
>
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> new file mode 100644
> index 0000000..da061d4
> --- /dev/null
> +++ b/arch/x86/xen/pmu.c
> @@ -0,0 +1,122 @@
> +#include <linux/types.h>
> +#include <linux/interrupt.h>
> +
> +#include <asm/xen/hypercall.h>
> +#include <xen/page.h>
> +#include <xen/interface/xen.h>
> +#include <xen/interface/vcpu.h>
> +#include <xen/interface/xenpmu.h>
> +
> +#include "xen-ops.h"
> +#include "pmu.h"
> +
> +/* x86_pmu.handle_irq definition */
> +#include <../kernel/cpu/perf_event.h>
> +
> +
> +/* Shared page between hypervisor and domain */
> +DEFINE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
> +
> +/* perf callbacks*/
> +int xen_is_in_guest(void)
> +{
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> +
> + if (!xen_initial_domain() ||
> + xenpmu_data->domain_id > DOMID_SELF || xenpmu_data->domain_id == 0)
> + return 0;
> +
> + return 1;
> +}
> +
> +static int xen_is_user_mode(void)
> +{
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> + return ((xenpmu_data->regs.cs & 3) == 3);
> +}
> +
> +static unsigned long xen_get_guest_ip(void)
> +{
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> + return xenpmu_data->regs.eip;
> +}
> +
> +static struct perf_guest_info_callbacks xen_guest_cbs = {
> + .is_in_guest = xen_is_in_guest,
> + .is_user_mode = xen_is_user_mode,
> + .get_guest_ip = xen_get_guest_ip,
> +};
> +
> +/* Convert registers from Xen's format to Linux' */
> +static void xen_convert_regs(struct cpu_user_regs *xen_regs,
> + struct pt_regs *regs)
> +{
> + regs->ip = xen_regs->eip;
> + regs->cs = xen_regs->cs;
> +}
> +
> +irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
> +{
> + int ret = IRQ_NONE;
> + struct pt_regs regs;
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> +
> + xen_convert_regs(&xenpmu_data->regs, &regs);
> + if (x86_pmu.handle_irq(&regs))
> + ret = IRQ_HANDLED;
> +
> + return ret;
> +}
> +
> +int xen_pmu_init(int cpu)
> +{
> + int ret = 0;
> + struct xenpmu_params xp;
> + unsigned long pfn;
> + struct xenpmu_data *xenpmu_data;
> +
> + BUILD_BUG_ON(sizeof(struct xenpmu_data) > PAGE_SIZE);
> + xenpmu_data = vmalloc(PAGE_SIZE);
> + if (!xenpmu_data) {
> + ret = -ENOMEM;
> + goto fail;
> + }
> + pfn = vmalloc_to_pfn((char *)xenpmu_data);
> +
> + xp.mfn = pfn_to_mfn(pfn);
> + xp.vcpu = cpu;
> + xp.version.maj = XENPMU_VER_MAJ;
> + xp.version.min = XENPMU_VER_MIN;
> + ret = HYPERVISOR_xenpmu_op(XENPMU_init, &xp);
> + if (ret)
> + goto fail;
> +
> + per_cpu(xenpmu_shared, cpu) = xenpmu_data;
> +
> + if (cpu == 0)
> + perf_register_guest_info_callbacks(&xen_guest_cbs);
> +
> + return ret;
> +
> +fail:
> + vfree(xenpmu_data);
> + return ret;
> +}
> +
> +void xen_pmu_finish(int cpu)
> +{
> + struct xenpmu_params xp;
> +
> + xp.vcpu = cpu;
> + xp.version.maj = XENPMU_VER_MAJ;
> + xp.version.min = XENPMU_VER_MIN;
> +
> + (void)HYPERVISOR_xenpmu_op(XENPMU_finish, &xp);
> +
> + vfree(per_cpu(xenpmu_shared, cpu));
> + per_cpu(xenpmu_shared, cpu) = NULL;

I think you are missing:

perf_unregister_guest_info_callbacks when this is the bootup CPU.

And you should probably move this around to be:

if (cpu == 0 && num_online_cpus() == 1)
perf_unregister_guest_info_callbacks(..)
per_cpu(xenpmu_shared, cpu) = NULL;
vfree(per_cpu(xenpmu_shared, cpu))

?
> +}
> diff --git a/arch/x86/xen/pmu.h b/arch/x86/xen/pmu.h
> new file mode 100644
> index 0000000..51de7d2
> --- /dev/null
> +++ b/arch/x86/xen/pmu.h
> @@ -0,0 +1,12 @@
> +#ifndef __XEN_PMU_H
> +#define __XEN_PMU_H
> +
> +#include <xen/interface/xenpmu.h>
> +
> +irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id);
> +int xen_pmu_init(int cpu);
> +void xen_pmu_finish(int cpu);
> +
> +DECLARE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
> +
> +#endif /* __XEN_PMU_H */
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index ca92754..17a88d1 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -26,6 +26,7 @@
>
> #include <xen/interface/xen.h>
> #include <xen/interface/vcpu.h>
> +#include <xen/interface/xenpmu.h>
>
> #include <asm/xen/interface.h>
> #include <asm/xen/hypercall.h>
> @@ -37,6 +38,7 @@
> #include <xen/hvc-console.h>
> #include "xen-ops.h"
> #include "mmu.h"
> +#include "pmu.h"
>
> cpumask_var_t xen_cpu_initialized_map;
>
> @@ -49,6 +51,7 @@ static DEFINE_PER_CPU(struct xen_common_irq, xen_callfunc_irq) = { .irq = -1 };
> static DEFINE_PER_CPU(struct xen_common_irq, xen_callfuncsingle_irq) = { .irq = -1 };
> static DEFINE_PER_CPU(struct xen_common_irq, xen_irq_work) = { .irq = -1 };
> static DEFINE_PER_CPU(struct xen_common_irq, xen_debug_irq) = { .irq = -1 };
> +static DEFINE_PER_CPU(struct xen_common_irq, xen_pmu_irq) = { .irq = -1 };
>
> static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id);
> static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
> @@ -139,11 +142,18 @@ static void xen_smp_intr_free(unsigned int cpu)
> kfree(per_cpu(xen_irq_work, cpu).name);
> per_cpu(xen_irq_work, cpu).name = NULL;
> }
> +
> + if (per_cpu(xen_pmu_irq, cpu).irq >= 0) {
> + unbind_from_irqhandler(per_cpu(xen_pmu_irq, cpu).irq, NULL);
> + per_cpu(xen_pmu_irq, cpu).irq = -1;
> + kfree(per_cpu(xen_pmu_irq, cpu).name);
> + per_cpu(xen_pmu_irq, cpu).name = NULL;
> + }
> };
> static int xen_smp_intr_init(unsigned int cpu)
> {
> int rc;
> - char *resched_name, *callfunc_name, *debug_name;
> + char *resched_name, *callfunc_name, *debug_name, *pmu_name;
>
> resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
> rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
> @@ -209,6 +219,18 @@ static int xen_smp_intr_init(unsigned int cpu)
> per_cpu(xen_irq_work, cpu).irq = rc;
> per_cpu(xen_irq_work, cpu).name = callfunc_name;
>
> + if (per_cpu(xenpmu_shared, cpu)) {
> + pmu_name = kasprintf(GFP_KERNEL, "pmu%d", cpu);
> + rc = bind_virq_to_irqhandler(VIRQ_XENPMU, cpu,
> + xen_pmu_irq_handler,
> + IRQF_PERCPU|IRQF_NOBALANCING,
> + pmu_name, NULL);
> + if (rc < 0)
> + goto fail;
> + per_cpu(xen_pmu_irq, cpu).irq = rc;
> + per_cpu(xen_pmu_irq, cpu).name = pmu_name;
> + }
> +
> return 0;
>
> fail:
> @@ -307,6 +329,9 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
> }
> set_cpu_sibling_map(0);
>
> + if (xen_pmu_init(0))
> + pr_err("Could not initialize VPMU for VCPU 0\n");
> +
> if (xen_smp_intr_init(0))
> BUG();
>
> @@ -427,6 +452,9 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
> /* Just in case we booted with a single CPU. */
> alternatives_enable_smp();
>
> + if (xen_pmu_init(cpu))
> + pr_err("Could not initialize VPMU for VCPU %u\n", cpu);
> +
> rc = xen_smp_intr_init(cpu);
> if (rc)
> return rc;
> @@ -468,6 +496,7 @@ static void xen_cpu_die(unsigned int cpu)
> xen_smp_intr_free(cpu);
> xen_uninit_lock_cpu(cpu);
> xen_teardown_timer(cpu);
> + xen_pmu_finish(cpu);
> }
>
> static void xen_play_dead(void) /* used only with HOTPLUG_CPU */

That all looks OK

2013-09-23 13:29:04

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v1 4/5] xen/PMU: Add support for PMU registes on PV guests

On Tue, Sep 10, 2013 at 11:31:49AM -0400, Boris Ostrovsky wrote:
> PMU emulation code: MSR caching in PMU context and LVTPC APIC
> handling. (Portions of this code are taken from Xen's VPMU
> implementation)
>
> Signed-off-by: Boris Ostrovsky <[email protected]>

Looks OK.
> ---
> arch/x86/xen/enlighten.c | 27 +++-
> arch/x86/xen/pmu.c | 289 ++++++++++++++++++++++++++++++++++++++++-
> arch/x86/xen/pmu.h | 3 +
> include/xen/interface/xenpmu.h | 5 +
> 4 files changed, 317 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 193097e..2512bd3 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -82,6 +82,7 @@
> #include "mmu.h"
> #include "smp.h"
> #include "multicalls.h"
> +#include "pmu.h"
>
> EXPORT_SYMBOL_GPL(hypercall_page);
>
> @@ -960,6 +961,11 @@ static u32 xen_apic_read(u32 reg)
>
> static void xen_apic_write(u32 reg, u32 val)
> {
> + if (reg == APIC_LVTPC) {
> + (void)pmu_apic_update(reg);
> + return;
> + }
> +
> /* Warn to see if there's any stray references */
> WARN_ON(1);
> }
> @@ -1064,11 +1070,20 @@ static inline void xen_write_cr8(unsigned long val)
> BUG_ON(val);
> }
> #endif
> -static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
> +
> +static u64 xen_read_msr_safe(unsigned int msr, int *err)
> {
> - int ret;
> + u64 val;
>
> - ret = 0;
> + if (pmu_msr_read(msr, &val, err))
> + return val;
> +
> + return native_read_msr_safe(msr, err);
> +}
> +
> +static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
> +{
> + int ret = 0;
>
> switch (msr) {
> #ifdef CONFIG_X86_64
> @@ -1102,10 +1117,10 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
> if (smp_processor_id() == 0)
> xen_set_pat(((u64)high << 32) | low);
> break;
> + }
>
> - default:
> + if (!pmu_msr_write(msr, low, high, &ret))
> ret = native_write_msr_safe(msr, low, high);
> - }
>
> return ret;
> }
> @@ -1239,7 +1254,7 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
>
> .wbinvd = native_wbinvd,
>
> - .read_msr = native_read_msr_safe,
> + .read_msr = xen_read_msr_safe,
> .write_msr = xen_write_msr_safe,
>
> .read_tsc = native_read_tsc,
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index da061d4..d8b059b 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -17,6 +17,291 @@
> /* Shared page between hypervisor and domain */
> DEFINE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
>
> +/* PMU register caching */
> +
> +/* AMD PMU */
> +static __read_mostly uint32_t amd_counters_base;
> +static __read_mostly uint32_t amd_ctrls_base;
> +static __read_mostly int amd_msr_step;
> +static __read_mostly int k7_counters_mirrored;
> +static __read_mostly int amd_num_counters;
> +
> +/* Intel PMU */
> +#define MSR_TYPE_COUNTER 0
> +#define MSR_TYPE_CTRL 1
> +#define MSR_TYPE_GLOBAL 2
> +#define MSR_TYPE_ARCH_COUNTER 3
> +#define MSR_TYPE_ARCH_CTRL 4
> +
> +#define PMU_GENERAL_NR_SHIFT 8 /* Number of general pmu registers */
> +#define PMU_GENERAL_NR_BITS 8 /* 8 bits 8..15 */
> +#define PMU_GENERAL_NR_MASK (((1 << PMU_GENERAL_NR_BITS) - 1) \
> + << PMU_GENERAL_NR_SHIFT)
> +
> +static __read_mostly int intel_num_counters;
> +
> +
> +static void xen_pmu_arch_init(void)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> +
> + switch (boot_cpu_data.x86) {
> + case 0x15:
> + amd_num_counters = F15H_NUM_COUNTERS;
> + amd_counters_base = MSR_F15H_PERF_CTR;
> + amd_ctrls_base = MSR_F15H_PERF_CTL;
> + amd_msr_step = 2;
> + k7_counters_mirrored = 1;
> + break;
> + case 0x10:
> + case 0x12:
> + case 0x14:
> + case 0x16:
> + default:
> + amd_num_counters = F10H_NUM_COUNTERS;
> + amd_counters_base = MSR_K7_PERFCTR0;
> + amd_ctrls_base = MSR_K7_EVNTSEL0;
> + amd_msr_step = 1;
> + k7_counters_mirrored = 0;
> + break;
> + }
> + } else {
> + uint32_t eax = cpuid_eax(0xa);
> +
> + intel_num_counters = (eax & PMU_GENERAL_NR_MASK) >>
> + PMU_GENERAL_NR_SHIFT;
> + }
> +}
> +
> +static inline uint32_t get_fam15h_addr(u32 addr)
> +{
> + switch (addr) {
> + case MSR_K7_PERFCTR0:
> + case MSR_K7_PERFCTR1:
> + case MSR_K7_PERFCTR2:
> + case MSR_K7_PERFCTR3:
> + return MSR_F15H_PERF_CTR + (addr - MSR_K7_PERFCTR0);
> + case MSR_K7_EVNTSEL0:
> + case MSR_K7_EVNTSEL1:
> + case MSR_K7_EVNTSEL2:
> + case MSR_K7_EVNTSEL3:
> + return MSR_F15H_PERF_CTL + (addr - MSR_K7_EVNTSEL0);
> + default:
> + break;
> + }
> +
> + return addr;
> +}
> +
> +static inline bool is_amd_pmu_msr(unsigned int msr)
> +{
> + if ((msr < MSR_F15H_PERF_CTL ||
> + msr > MSR_F15H_PERF_CTR + amd_num_counters) &&
> + (msr < MSR_K7_EVNTSEL0 ||
> + msr > MSR_K7_PERFCTR0 + amd_num_counters))
> + return true;
> +
> + return false;
> +}
> +
> +static bool is_core2_pmu_msr(u32 msr_index, int *type, int *index)
> +{
> + int i;
> +
> + for (i = 0; i < VPMU_CORE2_NUM_FIXED; i++) {
> + if (core2_fix_counters_msr[i] == msr_index) {
> + *type = MSR_TYPE_COUNTER;
> + *index = i;
> + return true;
> + }
> + }
> +
> + for (i = 0; i < VPMU_CORE2_NUM_CTRLS; i++) {
> + if (core2_ctrls_msr[i] == msr_index) {
> + *type = MSR_TYPE_CTRL;
> + *index = i;
> + return true;
> + }
> + }
> +
> + if ((msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
> + (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
> + (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL)) {
> + *type = MSR_TYPE_GLOBAL;
> + return true;
> + }
> +
> + if ((msr_index >= MSR_IA32_PERFCTR0) &&
> + (msr_index < (MSR_IA32_PERFCTR0 + intel_num_counters))) {
> + *type = MSR_TYPE_ARCH_COUNTER;
> + *index = msr_index - MSR_IA32_PERFCTR0;
> + return true;
> + }
> +
> + if ((msr_index >= MSR_P6_EVNTSEL0) &&
> + (msr_index < (MSR_P6_EVNTSEL0 + intel_num_counters))) {
> + *type = MSR_TYPE_ARCH_CTRL;
> + *index = msr_index - MSR_P6_EVNTSEL0;
> + return true;
> + }
> +
> + return false;
> +}
> +
> +int xen_intel_pmu_rw(unsigned int msr, u64 *val, int type,
> + int index, bool is_read)
> +{
> + uint64_t *reg = NULL;
> + struct core2_vpmu_context *ctxt;
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> +
> + if (!xenpmu_data)
> + return 1;
> +
> + if (!(xenpmu_data->pmu_flags & PMU_CACHED)) /* No caching needed */
> + return 1;
> +
> + ctxt = &xenpmu_data->pmu.intel;
> +
> + switch (msr) {
> + case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> + reg = &ctxt->global_ovf_ctrl;
> + break;
> + case MSR_CORE_PERF_GLOBAL_STATUS:
> + reg = &ctxt->global_status;
> + break;
> + case MSR_CORE_PERF_GLOBAL_CTRL:
> + reg = &ctxt->global_ctrl;
> + break;
> + case MSR_CORE_PERF_FIXED_CTR_CTRL:
> + reg = &ctxt->ctrls[0];
> + break;
> + default:
> + switch (type) {
> + case MSR_TYPE_COUNTER:
> + reg = &ctxt->fix_counters[index];
> + break;
> + case MSR_TYPE_ARCH_COUNTER:
> + reg = &ctxt->arch_msr_pair[index].counter;
> + break;
> + case MSR_TYPE_ARCH_CTRL:
> + reg = &ctxt->arch_msr_pair[index].control;
> + break;
> + default:
> + return 1;
> + }
> + }
> +
> + if (reg) {
> + if (is_read)
> + *val = *reg;
> + else {
> + *reg = *val;
> +
> + if (msr == MSR_CORE_PERF_GLOBAL_OVF_CTRL)
> + ctxt->global_status &= (~(*val));
> + }
> + return 0;
> + }
> +
> + return 1;
> +}
> +
> +int xen_amd_pmu_rw(unsigned int msr, u64 *val, bool is_read)
> +{
> + uint64_t *reg = NULL;
> + int i, off = 0;
> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> + smp_processor_id());
> +
> + if (!xenpmu_data)
> + return 1;
> +
> + if (!(xenpmu_data->pmu_flags & PMU_CACHED)) /* No caching needed */
> + return 1;
> +
> + if (k7_counters_mirrored &&
> + ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)))
> + msr = get_fam15h_addr(msr);
> +
> + for (i = 0; i < amd_num_counters; i++) {
> + if (msr == amd_ctrls_base + off) {
> + reg = &xenpmu_data->pmu.amd.ctrls[i];
> + break;
> + } else if (msr == amd_counters_base + off) {
> + reg = &xenpmu_data->pmu.amd.counters[i];
> + break;
> + }
> + off += amd_msr_step;
> + }
> +
> + if (reg) {
> + if (is_read)
> + *val = *reg;
> + else
> + *reg = *val;
> + return 0;
> + }
> + return 1;
> +}
> +
> +int pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> + if (is_amd_pmu_msr(msr)) {
> + if (xen_amd_pmu_rw(msr, val, true))
> + *val = native_read_msr_safe(msr, err);
> + return 1;
> + }
> + } else {
> + int type, index;
> + if (is_core2_pmu_msr(msr, &type, &index)) {
> + if (xen_intel_pmu_rw(msr, val, type, index, true))
> + *val = native_read_msr_safe(msr, err);
> + return 1;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
> +{
> + uint64_t val = ((uint64_t)high << 32) | low;
> +
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> + if (is_amd_pmu_msr(msr)) {
> + if (xen_amd_pmu_rw(msr, &val, false))
> + *err = native_write_msr_safe(msr, low, high);
> + return 1;
> + }
> + } else {
> + int type, index;
> +
> + if (is_core2_pmu_msr(msr, &type, &index)) {
> + if (xen_intel_pmu_rw(msr, &val, type, index, false))
> + *err = native_write_msr_safe(msr, low, high);
> + return 1;
> + }
> + }
> +
> + return 0;
> +}
> +
> +int pmu_apic_update(uint64_t reg)
> +{
> + int ret;
> + struct xenpmu_params xp;
> +
> + xp.lvtpc = reg;
> + xp.version.maj = XENPMU_VER_MAJ;
> + xp.version.min = XENPMU_VER_MIN;
> + ret = HYPERVISOR_xenpmu_op(XENPMU_lvtpc_set, &xp);
> +
> + return ret;
> +}
> +
> /* perf callbacks*/
> int xen_is_in_guest(void)
> {
> @@ -97,8 +382,10 @@ int xen_pmu_init(int cpu)
>
> per_cpu(xenpmu_shared, cpu) = xenpmu_data;
>
> - if (cpu == 0)
> + if (cpu == 0) {
> perf_register_guest_info_callbacks(&xen_guest_cbs);
> + xen_pmu_arch_init();
> + }
>
> return ret;
>
> diff --git a/arch/x86/xen/pmu.h b/arch/x86/xen/pmu.h
> index 51de7d2..adc2b5e 100644
> --- a/arch/x86/xen/pmu.h
> +++ b/arch/x86/xen/pmu.h
> @@ -6,6 +6,9 @@
> irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id);
> int xen_pmu_init(int cpu);
> void xen_pmu_finish(int cpu);
> +int pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
> +int pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
> +int pmu_apic_update(uint64_t reg);
>
> DECLARE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
>
> diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
> index 7af682d..16fe1ab 100644
> --- a/include/xen/interface/xenpmu.h
> +++ b/include/xen/interface/xenpmu.h
> @@ -15,6 +15,7 @@
> #define XENPMU_flags_set 3
> #define XENPMU_init 4
> #define XENPMU_finish 5
> +#define XENPMU_lvtpc_set 6
>
> /* Parameter structure for HYPERVISOR_xenpmu_op call */
> struct xenpmu_params {
> @@ -28,6 +29,7 @@ struct xenpmu_params {
> uint64_t control;
> uint64_t mfn;
> uint64_t vcpu;
> + uint64_t lvtpc;
> };
>
> /* VPMU modes */
> @@ -96,6 +98,9 @@ struct core2_vpmu_context {
> struct core2_pmu_enable *pmu_enable;
> };
>
> +/* PMU flags */
> +#define PMU_CACHED 1
> +
> /* Shared between hypervisor and PV domain */
> struct xenpmu_data {
> union {
> --
> 1.8.1.4
>

2013-09-23 13:29:17

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] xen/PMU: Cache MSR accesses during interrupt handling

On Tue, Sep 10, 2013 at 11:31:50AM -0400, Boris Ostrovsky wrote:
> Avoid trapping to hypervisor on each MSR access during interrupt handling.
> Instead, use cached MSR values provided in shared xenpmu_data by Xen. When
> handling is completed, flush the registers to hypervisor who will load them
> into HW.

Looks OK.
>
> Signed-off-by: Boris Ostrovsky <[email protected]>
> ---
> arch/x86/xen/pmu.c | 15 ++++++++++++++-
> include/xen/interface/xenpmu.h | 1 +
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index d8b059b..bdd7c4a 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -345,14 +345,27 @@ static void xen_convert_regs(struct cpu_user_regs *xen_regs,
>
> irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
> {
> - int ret = IRQ_NONE;
> + int err, ret = IRQ_NONE;
> struct pt_regs regs;
> struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
> smp_processor_id());
>
> + /*
> + * While handling the interrupt MSR accesses will be cached
> + * in PMU context
> + */
> + xenpmu_data->pmu_flags |= PMU_CACHED;
> xen_convert_regs(&xenpmu_data->regs, &regs);
> if (x86_pmu.handle_irq(&regs))
> ret = IRQ_HANDLED;
> + xenpmu_data->pmu_flags &= ~PMU_CACHED;
> +
> + /* Write out cached context to HW */
> + err = HYPERVISOR_xenpmu_op(XENPMU_flush, NULL);
> + if (err) {
> + WARN(1, "%s failed hypercall, err: %d\n", __func__, err);
> + return IRQ_NONE;
> + }
>
> return ret;
> }
> diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
> index 16fe1ab..ec0e802 100644
> --- a/include/xen/interface/xenpmu.h
> +++ b/include/xen/interface/xenpmu.h
> @@ -16,6 +16,7 @@
> #define XENPMU_init 4
> #define XENPMU_finish 5
> #define XENPMU_lvtpc_set 6
> +#define XENPMU_flush 7 /* Write cached MSR values to HW */
>
> /* Parameter structure for HYPERVISOR_xenpmu_op call */
> struct xenpmu_params {
> --
> 1.8.1.4
>

2013-09-23 14:03:49

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH v1 2/5] xen/PMU: Sysfs interface for setting Xen PMU mode

On 09/23/2013 09:17 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 10, 2013 at 11:31:47AM -0400, Boris Ostrovsky wrote:
>> +
>> +/* Parameter structure for HYPERVISOR_xenpmu_op call */
>> +struct xenpmu_params {
>> + union {
>> + struct version {
>> + uint8_t maj;
>> + uint8_t min;
>> + } version;
>> + uint64_t pad;
>> + };
>> + uint64_t control;
>> +};
>> +
>> +/* VPMU modes */
>> +#define VPMU_MODE_MASK 0xff
>> +#define VPMU_OFF 0
>> +/* guests can profile themselves, (dom0 profiles itself and Xen) */
>> +#define VPMU_ON (1<<0)
>> +/*
>> + * Only dom0 has access to VPMU and it profiles everyone: itself,
>> + * the hypervisor and the guests.
>> + */
>> +#define VPMU_PRIV (1<<1)
>> +
>> +/* VPMU flags */
>> +#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK))
>> +#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */
>> +
>> +#endif /* __XEN_PUBLIC_XENPMU_H__ */
> Looks OK to me if there are no changes to the hypervisor ABI.


v2 patches for hypervisor have different layout for shared structures
and parameters so this will have to change as well. I don't believe
there are any logic changes to this series due to this v2, except for
the first patch (getting Xen symbols into dom0).

-boris

2013-09-23 14:16:40

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH v1 3/5] xen/PMU: Initialization code for Xen PMU

On 09/23/2013 09:26 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 10, 2013 at 11:31:48AM -0400, Boris Ostrovsky wrote:
>> Map shared data structure that will hold CPU registers, VPMU context,
>> VCPU/PCPI IDs of the VCPU interrupted by PMU interrupt. Hypervisor
>> fills this information in its handler and passes it to the guest for
>> further processing.
>>
>> Set up PMU VIRQ.
>>
>> Signed-off-by: Boris Ostrovsky <[email protected]>
>> ---
>> arch/x86/xen/Makefile | 2 +-
>> arch/x86/xen/pmu.c | 122 +++++++++++++++++++++++++++++++++++++++++
>> arch/x86/xen/pmu.h | 12 ++++
>> arch/x86/xen/smp.c | 31 ++++++++++-
>> include/xen/interface/xen.h | 1 +
>> include/xen/interface/xenpmu.h | 77 ++++++++++++++++++++++++++
>> 6 files changed, 243 insertions(+), 2 deletions(-)
>> create mode 100644 arch/x86/xen/pmu.c
>> create mode 100644 arch/x86/xen/pmu.h
>>
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index 96ab2c0..b187df5 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp)
>> obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
>> time.o xen-asm.o xen-asm_$(BITS).o \
>> grant-table.o suspend.o platform-pci-unplug.o \
>> - p2m.o
>> + p2m.o pmu.o
> Perhaps guard the build of this based on CONFIG_PERF_EVENTS?
>
> That would of course mean you also have to create in xenpmu.h
> static inline empy functions for xen_pmu_finish and xen_pmu_init in case
> CONFIG_PERF_EVENTS is not set.

This is interface header, am I allowed to do those sorts of things there?

Also, *theoretically* other performance-measuring tools can use this
framework, expect for perf-specific things like callbacks and the
handler call. So maybe I should put CONFIG_PERF_EVENTS in pmu.c?

>>
>> obj-$(CONFIG_EVENT_TRACING) += trace.o
>>
>> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
>> new file mode 100644
>> index 0000000..da061d4
>> --- /dev/null
>> +++ b/arch/x86/xen/pmu.c
>> @@ -0,0 +1,122 @@
>> +#include <linux/types.h>
>> +#include <linux/interrupt.h>
>> +
>> +#include <asm/xen/hypercall.h>
>> +#include <xen/page.h>
>> +#include <xen/interface/xen.h>
>> +#include <xen/interface/vcpu.h>
>> +#include <xen/interface/xenpmu.h>
>> +
>> +#include "xen-ops.h"
>> +#include "pmu.h"
>> +
>> +/* x86_pmu.handle_irq definition */
>> +#include <../kernel/cpu/perf_event.h>
>> +
>> +
>> +/* Shared page between hypervisor and domain */
>> +DEFINE_PER_CPU(struct xenpmu_data *, xenpmu_shared);
>> +
>> +/* perf callbacks*/
>> +int xen_is_in_guest(void)
>> +{
>> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
>> + smp_processor_id());
>> +
>> + if (!xen_initial_domain() ||
>> + xenpmu_data->domain_id > DOMID_SELF || xenpmu_data->domain_id == 0)
>> + return 0;
>> +
>> + return 1;
>> +}
>> +
>> +static int xen_is_user_mode(void)
>> +{
>> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
>> + smp_processor_id());
>> + return ((xenpmu_data->regs.cs & 3) == 3);
>> +}
>> +
>> +static unsigned long xen_get_guest_ip(void)
>> +{
>> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
>> + smp_processor_id());
>> + return xenpmu_data->regs.eip;
>> +}
>> +
>> +static struct perf_guest_info_callbacks xen_guest_cbs = {
>> + .is_in_guest = xen_is_in_guest,
>> + .is_user_mode = xen_is_user_mode,
>> + .get_guest_ip = xen_get_guest_ip,
>> +};
>> +
>> +/* Convert registers from Xen's format to Linux' */
>> +static void xen_convert_regs(struct cpu_user_regs *xen_regs,
>> + struct pt_regs *regs)
>> +{
>> + regs->ip = xen_regs->eip;
>> + regs->cs = xen_regs->cs;
>> +}
>> +
>> +irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
>> +{
>> + int ret = IRQ_NONE;
>> + struct pt_regs regs;
>> + struct xenpmu_data *xenpmu_data = per_cpu(xenpmu_shared,
>> + smp_processor_id());
>> +
>> + xen_convert_regs(&xenpmu_data->regs, &regs);
>> + if (x86_pmu.handle_irq(&regs))
>> + ret = IRQ_HANDLED;
>> +
>> + return ret;
>> +}
>> +
>> +int xen_pmu_init(int cpu)
>> +{
>> + int ret = 0;
>> + struct xenpmu_params xp;
>> + unsigned long pfn;
>> + struct xenpmu_data *xenpmu_data;
>> +
>> + BUILD_BUG_ON(sizeof(struct xenpmu_data) > PAGE_SIZE);
>> + xenpmu_data = vmalloc(PAGE_SIZE);
>> + if (!xenpmu_data) {
>> + ret = -ENOMEM;
>> + goto fail;
>> + }
>> + pfn = vmalloc_to_pfn((char *)xenpmu_data);
>> +
>> + xp.mfn = pfn_to_mfn(pfn);
>> + xp.vcpu = cpu;
>> + xp.version.maj = XENPMU_VER_MAJ;
>> + xp.version.min = XENPMU_VER_MIN;
>> + ret = HYPERVISOR_xenpmu_op(XENPMU_init, &xp);
>> + if (ret)
>> + goto fail;
>> +
>> + per_cpu(xenpmu_shared, cpu) = xenpmu_data;
>> +
>> + if (cpu == 0)
>> + perf_register_guest_info_callbacks(&xen_guest_cbs);
>> +
>> + return ret;
>> +
>> +fail:
>> + vfree(xenpmu_data);
>> + return ret;
>> +}
>> +
>> +void xen_pmu_finish(int cpu)
>> +{
>> + struct xenpmu_params xp;
>> +
>> + xp.vcpu = cpu;
>> + xp.version.maj = XENPMU_VER_MAJ;
>> + xp.version.min = XENPMU_VER_MIN;
>> +
>> + (void)HYPERVISOR_xenpmu_op(XENPMU_finish, &xp);
>> +
>> + vfree(per_cpu(xenpmu_shared, cpu));
>> + per_cpu(xenpmu_shared, cpu) = NULL;
> I think you are missing:
>
> perf_unregister_guest_info_callbacks when this is the bootup CPU.

But there is always at least one CPU running, so I think I should keep
the callbacks. The only reason I have 'if (cpu == 0)' when I register
the callbacks is because I only want to do this once and when we are
coming up cpu 0 is always the boot cpu (right?).

>
> And you should probably move this around to be:
>
> if (cpu == 0 && num_online_cpus() == 1)
> perf_unregister_guest_info_callbacks(..)
> per_cpu(xenpmu_shared, cpu) = NULL;
> vfree(per_cpu(xenpmu_shared, cpu))

Then I'd be calling vfree(NULL), wouldn't I?

-boris

2013-09-23 14:24:13

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v1 3/5] xen/PMU: Initialization code for Xen PMU

On 09/23/2013 10:18 AM, Boris Ostrovsky wrote:
> On 09/23/2013 09:26 AM, Konrad Rzeszutek Wilk wrote:
>> On Tue, Sep 10, 2013 at 11:31:48AM -0400, Boris Ostrovsky wrote:
>>> Map shared data structure that will hold CPU registers, VPMU context,
>>> VCPU/PCPI IDs of the VCPU interrupted by PMU interrupt. Hypervisor
>>> fills this information in its handler and passes it to the guest for
>>> further processing.
>>>
>>> Set up PMU VIRQ.
>>>
>>> Signed-off-by: Boris Ostrovsky <[email protected]>
>>> ---
>>> arch/x86/xen/Makefile | 2 +-
>>> arch/x86/xen/pmu.c | 122
>>> +++++++++++++++++++++++++++++++++++++++++
>>> arch/x86/xen/pmu.h | 12 ++++
>>> arch/x86/xen/smp.c | 31 ++++++++++-
>>> include/xen/interface/xen.h | 1 +
>>> include/xen/interface/xenpmu.h | 77 ++++++++++++++++++++++++++
>>> 6 files changed, 243 insertions(+), 2 deletions(-)
>>> create mode 100644 arch/x86/xen/pmu.c
>>> create mode 100644 arch/x86/xen/pmu.h
>>>
>>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>>> index 96ab2c0..b187df5 100644
>>> --- a/arch/x86/xen/Makefile
>>> +++ b/arch/x86/xen/Makefile
>>> @@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp)
>>> obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
>>> time.o xen-asm.o xen-asm_$(BITS).o \
>>> grant-table.o suspend.o platform-pci-unplug.o \
>>> - p2m.o
>>> + p2m.o pmu.o
>> Perhaps guard the build of this based on CONFIG_PERF_EVENTS?
>>
>> That would of course mean you also have to create in xenpmu.h
>> static inline empy functions for xen_pmu_finish and xen_pmu_init in case
>> CONFIG_PERF_EVENTS is not set.
>
> This is interface header, am I allowed to do those sorts of things there?

Never mind that, I was thinking of a different file. Sorry.

>
> Also, *theoretically* other performance-measuring tools can use this
> framework, expect for perf-specific things like callbacks and the
> handler call. So maybe I should put CONFIG_PERF_EVENTS in pmu.c?
>

-boris