Hi,
this small set of patches implement support for the amd guest/host-only
bits into perf. The counting for either host or guest-mode can be
enabled using two additional attribute-bits.
The patches are RFC for now for several reasons. First of all, these are
my first patches for perf, so I likely don't implemented everything the
perf-way. Another reason is, that when one of these two bits is set, the
counters will only count when SVM is enabled (with KVM this is only the
case when a guest runs). So even when host-only is configured (which
needs to be done explicitly, so no regression, default is still to set
none of these two bits) the counter will not count at all as long as no
kvm guest is running. The question is whether this is ok or whether this
situation needs to be handled (say, count when host-only is specified by
userspace and svm is disabled). Or we just don't care because specifing
guest/host-only counting only makes sense with guests anyway. I am open
for both.
So any feedback is greatly appreciated :-)
Regards,
Joerg
Diffstat:
arch/x86/include/asm/perf_event.h | 3 +++
arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
include/linux/perf_event.h | 5 ++++-
kernel/perf_event.c | 4 ++++
tools/perf/util/parse-events.c | 10 +++++++++-
5 files changed, 26 insertions(+), 2 deletions(-)
Shortlog:
Joerg Roedel (3):
perf, core: Introduce attrs to count in either host or guest mode
perf, x86: Use GO/HO bits in perf-ctr
perf, tools: Add support for guest/host-only profiling
The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.
An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.
Signed-off-by: Joerg Roedel <[email protected]>
---
include/linux/perf_event.h | 5 ++++-
kernel/perf_event.c | 4 ++++
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee9f1e7..3823f34 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -217,7 +217,10 @@ struct perf_event_attr {
mmap_data : 1, /* non-exec mmap data */
sample_id_all : 1, /* sample_type all events */
- __reserved_1 : 45;
+ exclude_host : 1, /* don't count in host */
+ exclude_guest : 1, /* don't count in guest */
+
+ __reserved_1 : 43;
union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8e81a98..1a9159b 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -6452,6 +6452,10 @@ SYSCALL_DEFINE5(perf_event_open,
return -EINVAL;
}
+ /* Can't exclude counting in guest and in host mode */
+ if (attr.exclude_host && attr.exclude_guest)
+ return -EINVAL;
+
/*
* In cgroup mode, the pid argument is used to pass the fd
* opened to the cgroup directory in cgroupfs. The cpu argument
--
1.7.4.1
The AMD perf-counters support counting in guest or host-mode
only. Make use of that feature when user-space specified
guest/host-mode only counting.
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/perf_event.h | 3 +++
arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index d9d4dae..34047c2 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
#define ARCH_PERFMON_EVENTSEL_INV (1ULL << 23)
#define ARCH_PERFMON_EVENTSEL_CMASK 0xFF000000ULL
+#define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40)
+#define AMD_PERFMON_EVENTSEL_HOSTONLY (1ULL << 41)
+
#define AMD64_EVENTSEL_EVENT \
(ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
#define INTEL_ARCH_EVENT_MASK \
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index cf4e369..afc21f3 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -116,6 +116,12 @@ static int amd_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;
+ if (event->attr.exclude_host)
+ event->hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY;
+
+ if (event->attr.exclude_guest)
+ event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
+
if (event->attr.type != PERF_TYPE_RAW)
return 0;
--
1.7.4.1
To restrict a counter to either host or guest mode this
patch introduces two new event modifiers: G and H.
With G the counter is configured in guest-only mode and with
H in host-only mode.
Signed-off-by: Joerg Roedel <[email protected]>
---
tools/perf/util/parse-events.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 952b4ae..21039d2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -723,7 +723,7 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
{
const char *str = *strp;
int exclude = 0;
- int eu = 0, ek = 0, eh = 0, precise = 0;
+ int eu = 0, ek = 0, eh = 0, eg = 0, ehst = 0, precise = 0;
if (*str++ != ':')
return 0;
@@ -740,6 +740,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
if (!exclude)
exclude = eu = ek = eh = 1;
eh = 0;
+ } else if (*str == 'G') {
+ eg = 0;
+ ehst = 1;
+ } else if (*str == 'H') {
+ eg = 1;
+ ehst = 0;
} else if (*str == 'p') {
precise++;
} else
@@ -753,6 +759,8 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
attr->exclude_kernel = ek;
attr->exclude_hv = eh;
attr->precise_ip = precise;
+ attr->exclude_host = ehst;
+ attr->exclude_guest = eg;
return 1;
}
return 0;
--
1.7.4.1
On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> + /* Can't exclude counting in guest and in host mode */
> + if (attr.exclude_host && attr.exclude_guest)
> + return -EINVAL;
Why not?
On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> The AMD perf-counters support counting in guest or host-mode
> only. Make use of that feature when user-space specified
> guest/host-mode only counting.
Subject mentions x86, does Intel have anything similar so you can make
it work for them too?
Also, might be nice to ask the power and sparc64 people if their archs
can also differentiate between guest and host thingies.
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index cf4e369..afc21f3 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -116,6 +116,12 @@ static int amd_pmu_hw_config(struct perf_event *event)
> if (ret)
> return ret;
>
> + if (event->attr.exclude_host)
> + event->hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY;
> +
> + if (event->attr.exclude_guest)
> + event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;
> +
> if (event->attr.type != PERF_TYPE_RAW)
> return 0;
On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> @@ -740,6 +740,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
> if (!exclude)
> exclude = eu = ek = eh = 1;
> eh = 0;
> + } else if (*str == 'G') {
> + eg = 0;
> + ehst = 1;
> + } else if (*str == 'H') {
> + eg = 1;
> + ehst = 0;
This doesn't match the existing exclude logic, also eH and eG come to mind.
On Tue, May 10, 2011 at 10:45:46AM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > + /* Can't exclude counting in guest and in host mode */
> > + if (attr.exclude_host && attr.exclude_guest)
> > + return -EINVAL;
>
> Why not?
By definition the counter won't count at all. The hardware just ignores
the bits if they are both set. My rationale here was that it does not
makes sense to setup a counter and exclude guest and host mode.
Joerg
On Tue, May 10, 2011 at 10:48:20AM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > The AMD perf-counters support counting in guest or host-mode
> > only. Make use of that feature when user-space specified
> > guest/host-mode only counting.
>
> Subject mentions x86, does Intel have anything similar so you can make
> it work for them too?
Intel does not support guest or host-only counting in the hardware (at
least according to my documentation). If wanted it could be approximated by
enabling/disabling the counters in the guest-entry path.
So subject should better say "perf, amd", right?
> Also, might be nice to ask the power and sparc64 people if their archs
> can also differentiate between guest and host thingies.
Right, I don't know if their hardware has similar features. I'll try to
find this out.
Joerg
On Tue, May 10, 2011 at 10:50:08AM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > @@ -740,6 +740,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
> > if (!exclude)
> > exclude = eu = ek = eh = 1;
> > eh = 0;
> > + } else if (*str == 'G') {
> > + eg = 0;
> > + ehst = 1;
> > + } else if (*str == 'H') {
> > + eg = 1;
> > + ehst = 0;
>
> This doesn't match the existing exclude logic, also eH and eG come to
> mind.
OK, eH and eG seems like a better choice. Regarding the logic I
explictly decided to do it this way. The reason is that guest/host
counting is orthogonal to user/kernel/hv counting. You can decide to
only count guest-kernel for example. And if a user just specifies
-e cycles:G this would automatically exlucde user and kernel counting.
This didn't make sense to me so I decided to keep the logic seperate for
guest/host exclusions.
Joerg
On Tue, 2011-05-10 at 16:59 +0200, Roedel, Joerg wrote:
> On Tue, May 10, 2011 at 10:45:46AM -0400, Peter Zijlstra wrote:
> > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > + /* Can't exclude counting in guest and in host mode */
> > > + if (attr.exclude_host && attr.exclude_guest)
> > > + return -EINVAL;
> >
> > Why not?
>
> By definition the counter won't count at all. The hardware just ignores
> the bits if they are both set. My rationale here was that it does not
> makes sense to setup a counter and exclude guest and host mode.
I would expect it to 'work' but simply return 0. If that isn't what the
AMD hardware does you need to fix that in the AMD driver.
* Joerg Roedel <[email protected]> wrote:
> So any feedback is greatly appreciated :-)
it's nice to see this hw feature utilized :)
> Diffstat:
>
> arch/x86/include/asm/perf_event.h | 3 +++
> arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
> include/linux/perf_event.h | 5 ++++-
> kernel/perf_event.c | 4 ++++
> tools/perf/util/parse-events.c | 10 +++++++++-
> 5 files changed, 26 insertions(+), 2 deletions(-)
I see you have not touched tools/perf/builtin-kvm.c - how does this feature
integrate with the features of 'perf kvm'? (are you using perf kvm by any
chance?)
Thanks,
Ingo
On Tue, 2011-05-10 at 17:08 +0200, Roedel, Joerg wrote:
> On Tue, May 10, 2011 at 10:50:08AM -0400, Peter Zijlstra wrote:
> > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > @@ -740,6 +740,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
> > > if (!exclude)
> > > exclude = eu = ek = eh = 1;
> > > eh = 0;
> > > + } else if (*str == 'G') {
> > > + eg = 0;
> > > + ehst = 1;
> > > + } else if (*str == 'H') {
> > > + eg = 1;
> > > + ehst = 0;
> >
> > This doesn't match the existing exclude logic, also eH and eG come to
> > mind.
>
> OK, eH and eG seems like a better choice. Regarding the logic I
> explictly decided to do it this way. The reason is that guest/host
> counting is orthogonal to user/kernel/hv counting. You can decide to
> only count guest-kernel for example. And if a user just specifies
> -e cycles:G this would automatically exlucde user and kernel counting.
> This didn't make sense to me so I decided to keep the logic seperate for
> guest/host exclusions.
OK, so the changelog lacked that bit of information ;-)
How about you do something like:
+ } else if (*str == 'G') {
+ if (!excl_GH)
+ excl_GH = eH = eG = 1;
+ eG = 0;
+ } else if (*str == 'H') {
+ if (!excl_GH)
+ excl_GH = eH = eG = 1;
+ eH = 0;
Which mirrors the existing logic but keeps it orthogonal?
Hmm,. does this nicely integrate with exclude_hv? that seems to want to
be grouped with G/H.
On Tue, May 10, 2011 at 11:18:29AM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 16:59 +0200, Roedel, Joerg wrote:
> > On Tue, May 10, 2011 at 10:45:46AM -0400, Peter Zijlstra wrote:
> > > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > > + /* Can't exclude counting in guest and in host mode */
> > > > + if (attr.exclude_host && attr.exclude_guest)
> > > > + return -EINVAL;
> > >
> > > Why not?
> >
> > By definition the counter won't count at all. The hardware just ignores
> > the bits if they are both set. My rationale here was that it does not
> > makes sense to setup a counter and exclude guest and host mode.
>
> I would expect it to 'work' but simply return 0. If that isn't what the
> AMD hardware does you need to fix that in the AMD driver.
By 'work' you mean that userspace can set it up but it doesn't count at
all in this situation? This would certainly be consistent behavior but I
can't imagine any use-case for it so that this code assumes that such a
situation is most likely a bug.
I can certainly change that if wanted, but I think its better to inform
userspace if we get weird values?
Regards,
Joerg
On Tue, May 10, 2011 at 11:25:14AM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 17:08 +0200, Roedel, Joerg wrote:
> > On Tue, May 10, 2011 at 10:50:08AM -0400, Peter Zijlstra wrote:
> > > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > > @@ -740,6 +740,12 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
> > > > if (!exclude)
> > > > exclude = eu = ek = eh = 1;
> > > > eh = 0;
> > > > + } else if (*str == 'G') {
> > > > + eg = 0;
> > > > + ehst = 1;
> > > > + } else if (*str == 'H') {
> > > > + eg = 1;
> > > > + ehst = 0;
> > >
> > > This doesn't match the existing exclude logic, also eH and eG come to
> > > mind.
> >
> > OK, eH and eG seems like a better choice. Regarding the logic I
> > explictly decided to do it this way. The reason is that guest/host
> > counting is orthogonal to user/kernel/hv counting. You can decide to
> > only count guest-kernel for example. And if a user just specifies
> > -e cycles:G this would automatically exlucde user and kernel counting.
> > This didn't make sense to me so I decided to keep the logic seperate for
> > guest/host exclusions.
>
> OK, so the changelog lacked that bit of information ;-)
Okay, so I add this information to the changelog.
> How about you do something like:
>
>
>
> + } else if (*str == 'G') {
> + if (!excl_GH)
> + excl_GH = eH = eG = 1;
> + eG = 0;
> + } else if (*str == 'H') {
> + if (!excl_GH)
> + excl_GH = eH = eG = 1;
> + eH = 0;
>
> Which mirrors the existing logic but keeps it orthogonal?
Right, it would better fit to the u/k/hv logic. It is not strictly
needed because there are only two excludes here but it makes the code
look more consistent. I'll change it.
> Hmm,. does this nicely integrate with exclude_hv? that seems to want to
> be grouped with G/H.
I intended to re-use the exlude_hv bit originally, but then I looked
into how this bit is used. On PPC it looks like this bit is set when
Linux itself runs as a guest to exclude the hypervisor code being
profiled. The meaning here is different (beacause Linux itself is the
hypervisor) and I wanted to avoid different semantics for this bit
across architectures. So I introduces seperate bits.
The exclude_hv bit can be used when we have some kind of perf-ctr
support for KVM guests.
Regards,
Joerg
On Tue, May 10, 2011 at 11:21:32AM -0400, Ingo Molnar wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > So any feedback is greatly appreciated :-)
>
> it's nice to see this hw feature utilized :)
Yeah, I planned to do this for some time now and finally found the time
to do it :)
>
> > Diffstat:
> >
> > arch/x86/include/asm/perf_event.h | 3 +++
> > arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
> > include/linux/perf_event.h | 5 ++++-
> > kernel/perf_event.c | 4 ++++
> > tools/perf/util/parse-events.c | 10 +++++++++-
> > 5 files changed, 26 insertions(+), 2 deletions(-)
>
> I see you have not touched tools/perf/builtin-kvm.c - how does this feature
> integrate with the features of 'perf kvm'? (are you using perf kvm by any
> chance?)
builtin-kvm looks like a wrapper for other perf-cmds which just sets up
some additional state to tell the sub-cmds that guest-symbols need to be
resolved and so on.
Implementing this into the place where the other exclude-bits get set
(which is in generic code) seemed like a good idea to me. Touching
builtin-kvm was unnecessary for this.
Regards,
Joerg
On Tue, 2011-05-10 at 17:38 +0200, Roedel, Joerg wrote:
> On Tue, May 10, 2011 at 11:18:29AM -0400, Peter Zijlstra wrote:
> > On Tue, 2011-05-10 at 16:59 +0200, Roedel, Joerg wrote:
> > > On Tue, May 10, 2011 at 10:45:46AM -0400, Peter Zijlstra wrote:
> > > > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > > > + /* Can't exclude counting in guest and in host mode */
> > > > > + if (attr.exclude_host && attr.exclude_guest)
> > > > > + return -EINVAL;
> > > >
> > > > Why not?
> > >
> > > By definition the counter won't count at all. The hardware just ignores
> > > the bits if they are both set. My rationale here was that it does not
> > > makes sense to setup a counter and exclude guest and host mode.
> >
> > I would expect it to 'work' but simply return 0. If that isn't what the
> > AMD hardware does you need to fix that in the AMD driver.
>
> By 'work' you mean that userspace can set it up but it doesn't count at
> all in this situation?
Right.
> This would certainly be consistent behavior but I
> can't imagine any use-case for it so that this code assumes that such a
> situation is most likely a bug.
> I can certainly change that if wanted, but I think its better to inform
> userspace if we get weird values?
The eternal how much rope to give and what knots to teach argument I
guess. As it is, I think we allow people to exclude both user- and
kernel-space, giving a similar situation, so allowing to exclude both
host and guest is consistent.
On Tue, 2011-05-10 at 17:46 +0200, Roedel, Joerg wrote:
>
> I intended to re-use the exlude_hv bit originally, but then I looked
> into how this bit is used. On PPC it looks like this bit is set when
> Linux itself runs as a guest to exclude the hypervisor code being
> profiled. The meaning here is different (beacause Linux itself is the
> hypervisor) and I wanted to avoid different semantics for this bit
> across architectures. So I introduces seperate bits.
> The exclude_hv bit can be used when we have some kind of perf-ctr
> support for KVM guests.
Ah, I didn't mean it like that, yes HV is strictly something different
(and you understood its purpose well, its when the 'host' is a guest
itself). What I was wondering about is if we want HV to be included in
the Guest/Host exclusion mask or not.
Then again, I guess that'll change behaviour in non-obvious ways, so
better not do that.
On Tue, May 10, 2011 at 12:07:18PM -0400, Peter Zijlstra wrote:
> On Tue, 2011-05-10 at 17:38 +0200, Roedel, Joerg wrote:
> > On Tue, May 10, 2011 at 11:18:29AM -0400, Peter Zijlstra wrote:
> > > On Tue, 2011-05-10 at 16:59 +0200, Roedel, Joerg wrote:
> > > > On Tue, May 10, 2011 at 10:45:46AM -0400, Peter Zijlstra wrote:
> > > > > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > > > > + /* Can't exclude counting in guest and in host mode */
> > > > > > + if (attr.exclude_host && attr.exclude_guest)
> > > > > > + return -EINVAL;
> > > > >
> > > > > Why not?
> > > >
> > > > By definition the counter won't count at all. The hardware just ignores
> > > > the bits if they are both set. My rationale here was that it does not
> > > > makes sense to setup a counter and exclude guest and host mode.
> > >
> > > I would expect it to 'work' but simply return 0. If that isn't what the
> > > AMD hardware does you need to fix that in the AMD driver.
> >
> > By 'work' you mean that userspace can set it up but it doesn't count at
> > all in this situation?
>
> Right.
>
> > This would certainly be consistent behavior but I
> > can't imagine any use-case for it so that this code assumes that such a
> > situation is most likely a bug.
> > I can certainly change that if wanted, but I think its better to inform
> > userspace if we get weird values?
>
> The eternal how much rope to give and what knots to teach argument I
> guess. As it is, I think we allow people to exclude both user- and
> kernel-space, giving a similar situation, so allowing to exclude both
> host and guest is consistent.
Okay, fine. So I change that.
Joerg
* Roedel, Joerg <[email protected]> wrote:
> On Tue, May 10, 2011 at 11:21:32AM -0400, Ingo Molnar wrote:
> >
> > * Joerg Roedel <[email protected]> wrote:
> >
> > > So any feedback is greatly appreciated :-)
> >
> > it's nice to see this hw feature utilized :)
>
> Yeah, I planned to do this for some time now and finally found the time
> to do it :)
>
> >
> > > Diffstat:
> > >
> > > arch/x86/include/asm/perf_event.h | 3 +++
> > > arch/x86/kernel/cpu/perf_event_amd.c | 6 ++++++
> > > include/linux/perf_event.h | 5 ++++-
> > > kernel/perf_event.c | 4 ++++
> > > tools/perf/util/parse-events.c | 10 +++++++++-
> > > 5 files changed, 26 insertions(+), 2 deletions(-)
> >
> > I see you have not touched tools/perf/builtin-kvm.c - how does this feature
> > integrate with the features of 'perf kvm'? (are you using perf kvm by any
> > chance?)
>
> builtin-kvm looks like a wrapper for other perf-cmds which just sets up
> some additional state to tell the sub-cmds that guest-symbols need to be
> resolved and so on.
>
> Implementing this into the place where the other exclude-bits get set (which
> is in generic code) seemed like a good idea to me. Touching builtin-kvm was
> unnecessary for this.
ok, then let me put it in another way:
'perf kvm' is the tool that is making Linux instrumentation friendlier to KVM
developers. Can you think of ways to utilize this new feature there?
For example a new 'perf kvm stat' feature could pass the exclusion bits to perf
stat automatically. 'perf kvm record' could record on the guest only by
default.
Things like that.
Thanks,
Ingo
On Tue, May 10, 2011 at 10:41:49PM +0200, Ingo Molnar wrote:
> 'perf kvm' is the tool that is making Linux instrumentation friendlier to KVM
> developers. Can you think of ways to utilize this new feature there?
>
> For example a new 'perf kvm stat' feature could pass the exclusion bits to perf
> stat automatically. 'perf kvm record' could record on the guest only by
> default.
Yes, that would make sense. 'perf kvm' can default to guest-only
profiling. Hmm, but that would change existing behavior. It is probably
better to bind it to the --guest and --host parameters of perf-kvm.
Okay, this changes existing behavior too, so the default-thing seems the
better choice :)
Joerg
On 05/10/2011 05:35 PM, Joerg Roedel wrote:
> The two new attributes exclude_guest and exclude_host can
> bes used by user-space to tell the kernel to setup
> performance counter to either only count while the CPU is in
> guest or in host mode.
> An additional check is also introduced to make sure
> user-space does not try to exclude guest and host mode from
> counting.
>
> @@ -217,7 +217,10 @@ struct perf_event_attr {
> mmap_data : 1, /* non-exec mmap data */
> sample_id_all : 1, /* sample_type all events */
>
> - __reserved_1 : 45;
> + exclude_host : 1, /* don't count in host */
> + exclude_guest : 1, /* don't count in guest */
> +
> + __reserved_1 : 43;
>
We have exclude_hv a few lines up, so only exclude_guest is needed.
--
error compiling committee.c: too many arguments to function
On 05/11/2011 02:44 PM, Avi Kivity wrote:
> On 05/10/2011 05:35 PM, Joerg Roedel wrote:
>> The two new attributes exclude_guest and exclude_host can
>> bes used by user-space to tell the kernel to setup
>> performance counter to either only count while the CPU is in
>> guest or in host mode.
>> An additional check is also introduced to make sure
>> user-space does not try to exclude guest and host mode from
>> counting.
>>
>> @@ -217,7 +217,10 @@ struct perf_event_attr {
>> mmap_data : 1, /* non-exec mmap data */
>> sample_id_all : 1, /* sample_type all events */
>>
>> - __reserved_1 : 45;
>> + exclude_host : 1, /* don't count in host */
>> + exclude_guest : 1, /* don't count in guest */
>> +
>> + __reserved_1 : 43;
>>
>
> We have exclude_hv a few lines up, so only exclude_guest is needed.
>
Well, actually, that depends on the meaning of exclude_hv.
--
error compiling committee.c: too many arguments to function
On 05/10/2011 06:04 PM, Roedel, Joerg wrote:
> On Tue, May 10, 2011 at 10:48:20AM -0400, Peter Zijlstra wrote:
> > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
> > > The AMD perf-counters support counting in guest or host-mode
> > > only. Make use of that feature when user-space specified
> > > guest/host-mode only counting.
> >
> > Subject mentions x86, does Intel have anything similar so you can make
> > it work for them too?
>
> Intel does not support guest or host-only counting in the hardware (at
> least according to my documentation). If wanted it could be approximated by
> enabling/disabling the counters in the guest-entry path.
vmx has support for atomically swapping MSRs during guest entry and exit
(you can load guest MSRs on entry, save guest MSRs on exit, and load
host MSRs on exit, but you can't save host MSRs on entry, so host-only
counters cannot be 100% accurate). We'd need some kvm/perf hooks to
program these MSR swaps, and to manually save the counters that cannot
be done automatically.
btw, your patchset can be further improved by integrating
exclude_guest/exclude_host into the constraints. For example if we have
three general purpose counters, two generic perf_events in user, one
exclude_guest perf_event, and one exclude_host perf_event, we can
schedule them all at all times, swapping the exclude_guest and
exclude_host events during guest entry/exit.
--
error compiling committee.c: too many arguments to function
On Wed, 2011-05-11 at 14:45 +0300, Avi Kivity wrote:
> Well, actually, that depends on the meaning of exclude_hv.
exclude_hv is for when the host is a guest and we don't wish to sample
the host's host etc. PPC introduced this because ppc-linux runs as a
guest of the 'firmware' and not on the native hardware.
On 05/11/2011 04:49 PM, Peter Zijlstra wrote:
> On Wed, 2011-05-11 at 14:45 +0300, Avi Kivity wrote:
> > Well, actually, that depends on the meaning of exclude_hv.
>
> exclude_hv is for when the host is a guest and we don't wish to sample
> the host's host etc. PPC introduced this because ppc-linux runs as a
> guest of the 'firmware' and not on the native hardware.
Okay. A comment is needed there to explain the differences then.
--
error compiling committee.c: too many arguments to function
* Avi Kivity <[email protected]> wrote:
> On 05/11/2011 04:49 PM, Peter Zijlstra wrote:
> >On Wed, 2011-05-11 at 14:45 +0300, Avi Kivity wrote:
> >> Well, actually, that depends on the meaning of exclude_hv.
> >
> >exclude_hv is for when the host is a guest and we don't wish to sample
> >the host's host etc. PPC introduced this because ppc-linux runs as a
> >guest of the 'firmware' and not on the native hardware.
>
> Okay. A comment is needed there to explain the differences then.
Yeah.
Also, the flags are named in a way to make the default zero value the most
inclusive option. So we do not have include_guest but exclude_guest, etc.
Thanks,
Ingo
On Wed, May 11, 2011 at 03:58:34PM +0300, Avi Kivity wrote:
> On 05/10/2011 06:04 PM, Roedel, Joerg wrote:
>> On Tue, May 10, 2011 at 10:48:20AM -0400, Peter Zijlstra wrote:
>> > On Tue, 2011-05-10 at 16:35 +0200, Joerg Roedel wrote:
>> > > The AMD perf-counters support counting in guest or host-mode
>> > > only. Make use of that feature when user-space specified
>> > > guest/host-mode only counting.
>> >
>> > Subject mentions x86, does Intel have anything similar so you can make
>> > it work for them too?
>>
>> Intel does not support guest or host-only counting in the hardware (at
>> least according to my documentation). If wanted it could be approximated by
>> enabling/disabling the counters in the guest-entry path.
>
> vmx has support for atomically swapping MSRs during guest entry and exit
> (you can load guest MSRs on entry, save guest MSRs on exit, and load
> host MSRs on exit, but you can't save host MSRs on entry, so host-only
> counters cannot be 100% accurate). We'd need some kvm/perf hooks to
> program these MSR swaps, and to manually save the counters that cannot
> be done automatically.
Well, wenn host counters are not saved on vmentry then we just need to
save them manually (which is a one-time thing and does not need to
happen at every vmentry).
Thanks for that information, when I am back in office on monday I'll try
to grab a machine and hopefully get this running.
> btw, your patchset can be further improved by integrating
> exclude_guest/exclude_host into the constraints. For example if we have
> three general purpose counters, two generic perf_events in user, one
> exclude_guest perf_event, and one exclude_host perf_event, we can
> schedule them all at all times, swapping the exclude_guest and
> exclude_host events during guest entry/exit.
Well, it will need some additional code in the amd vmrun path too. I'll
take a look at it, but probably leave it as a future optimization.
Thanks for the hint.
Joerg