2007-06-20 12:32:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

On Wednesday 20 June 2007 12:35:56 Björn Steinbrink wrote:
> The performance counter allocator is tied to the LAPIC NMI watchdog,

That's not true. It's completely independent. It just happens to be in
the same file, but it has no direct functional ties to the watchdog.


> Fix the performance counter allocator by making it independent of the
> LAPIC NMI watchdog. This also fixes a bug in the LAPIC NMI watchdog
> which allocated the wrong performance counter on CPUs with PerfMon
> support.

Combining code movement with functional fixes makes it impossible
to review properly. Don't do that please.

-An


2007-06-20 12:50:21

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

Andi,

On Wed, Jun 20, 2007 at 02:31:43PM +0200, Andi Kleen wrote:
> On Wednesday 20 June 2007 12:35:56 Bj?rn Steinbrink wrote:
> > The performance counter allocator is tied to the LAPIC NMI watchdog,
>
> That's not true. It's completely independent. It just happens to be in
> the same file, but it has no direct functional ties to the watchdog.
>
I agree with you that the allocator is functionally independent of the
watchdog. That is how I'd like to see it and I think we all agree on
that.

Yet in the current implementation, there is a link between the two which
causes the issues I ran into. If you look at:

static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
{
return wd_ops ? msr - wd_ops->evntsel : 0;
}

int reserve_evntsel_nmi(unsigned int msr)
{
unsigned int counter;

counter = nmi_evntsel_msr_to_bit(msr);
BUG_ON(counter > NMI_MAX_COUNTER_BITS);
....
}

You see that if the wd_ops (a watchdog structure) is not initialized
then the allocator collapses all MSRs onto one bit.

Once this is fixed (which is what Bjorn did), then I will agree with you.
For this, the allocator needs to be able to probe the CPU and initialize
its own data structures.

--
-Stephane

2007-06-20 13:01:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog


> Once this is fixed (which is what Bjorn did), then I will agree with you.
> For this, the allocator needs to be able to probe the CPU and initialize
> its own data structures.

Ok that sounds reasonable. Please someone send a patch that does only
that.

-Andi

2007-06-20 13:18:22

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

On 2007.06.20 14:31:43 +0200, Andi Kleen wrote:
> On Wednesday 20 June 2007 12:35:56 Bj?rn Steinbrink wrote:
> > The performance counter allocator is tied to the LAPIC NMI watchdog,
>
> That's not true. It's completely independent. It just happens to be in
> the same file, but it has no direct functional ties to the watchdog.

The allocator relies on the wd_ops structure, which is only initialized
if the LAPIC NMI watchdog is enabled. If it is not enabled, the
functions that convert from absolute perfctr/evntsel addresses to
offsets from the respective base address break, and always return 0.

> > Fix the performance counter allocator by making it independent of the
> > LAPIC NMI watchdog. This also fixes a bug in the LAPIC NMI watchdog
> > which allocated the wrong performance counter on CPUs with PerfMon
> > support.
>
> Combining code movement with functional fixes makes it impossible
> to review properly. Don't do that please.

Ok.

Bj?rn

2007-06-20 18:44:46

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

On 2007.06.20 15:01:02 +0200, Andi Kleen wrote:
>
> > Once this is fixed (which is what Bjorn did), then I will agree with you.
> > For this, the allocator needs to be able to probe the CPU and initialize
> > its own data structures.
>
> Ok that sounds reasonable. Please someone send a patch that does only
> that.

OK, here come the bugfixes without any restructuring. The first patch
enables unconditional probing of the watchdog. The second makes the
perfmon nmi watchdog reserve the correct perfctr/evntsel.

Bj?rn

2007-06-20 18:45:14

by Björn Steinbrink

[permalink] [raw]
Subject: [PATCH 1/2] Always probe the NMI watchdog

The performance counter allocator relies on the nmi watchdog being
probed, so we have to do that even if the watchdog is not enabled.

Signed-off-by: Bj?rn Steinbrink <[email protected]>
---
arch/i386/kernel/cpu/perfctr-watchdog.c | 11 +++++------
arch/i386/kernel/nmi.c | 3 +++
arch/x86_64/kernel/nmi.c | 3 +++
include/asm-i386/nmi.h | 1 +
include/asm-x86_64/nmi.h | 1 +
5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index f0b6763..a12dbcf 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -572,7 +572,7 @@ static struct wd_ops intel_arch_wd_ops = {
.evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
};

-static void probe_nmi_watchdog(void)
+void probe_nmi_watchdog(void)
{
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
@@ -610,17 +610,16 @@ static void probe_nmi_watchdog(void)

int lapic_watchdog_init(unsigned nmi_hz)
{
- if (!wd_ops) {
- probe_nmi_watchdog();
- if (!wd_ops)
- return -1;
+ if (!wd_ops)
+ return -1;

+ /* hack to make sure that we only try to reserver the perfctrs once */
+ if (smp_processor_id() == 0)
if (!wd_ops->reserve()) {
printk(KERN_ERR
"NMI watchdog: cannot reserve perfctrs\n");
return -1;
}
- }

if (!(wd_ops->setup(nmi_hz))) {
printk(KERN_ERR "Cannot setup NMI watchdog on CPU %d\n",
diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c
index fba121f..8788f91 100644
--- a/arch/i386/kernel/nmi.c
+++ b/arch/i386/kernel/nmi.c
@@ -248,6 +248,9 @@ void setup_apic_nmi_watchdog (void *unused)
if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
return;

+ /* always probe the watchdog, the perfctr allocator requires that */
+ probe_nmi_watchdog();
+
switch (nmi_watchdog) {
case NMI_LOCAL_APIC:
__get_cpu_var(wd_enabled) = 1; /* enable it before to avoid race with handler */
diff --git a/arch/x86_64/kernel/nmi.c b/arch/x86_64/kernel/nmi.c
index 931c64b..3229a26 100644
--- a/arch/x86_64/kernel/nmi.c
+++ b/arch/x86_64/kernel/nmi.c
@@ -255,6 +255,9 @@ void setup_apic_nmi_watchdog(void *unused)
if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
return;

+ /* always probe the watchdog, the perfctr allocator requires that */
+ probe_nmi_watchdog();
+
switch (nmi_watchdog) {
case NMI_LOCAL_APIC:
__get_cpu_var(wd_enabled) = 1;
diff --git a/include/asm-i386/nmi.h b/include/asm-i386/nmi.h
index fb1e133..cf87cd0 100644
--- a/include/asm-i386/nmi.h
+++ b/include/asm-i386/nmi.h
@@ -18,6 +18,7 @@
int do_nmi_callback(struct pt_regs *regs, int cpu);

extern int nmi_watchdog_enabled;
+extern void probe_nmi_watchdog(void);
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int avail_to_resrv_perfctr_nmi(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
diff --git a/include/asm-x86_64/nmi.h b/include/asm-x86_64/nmi.h
index d0a7f53..f61653b 100644
--- a/include/asm-x86_64/nmi.h
+++ b/include/asm-x86_64/nmi.h
@@ -45,6 +45,7 @@ extern int panic_on_timeout;
extern int unknown_nmi_panic;
extern int nmi_watchdog_enabled;

+extern void probe_nmi_watchdog(void);
extern int check_nmi_watchdog(void);
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int avail_to_resrv_perfctr_nmi(unsigned int);
--
1.5.2.2

2007-06-20 18:45:46

by Björn Steinbrink

[permalink] [raw]
Subject: [PATCH 2/2] Reserve the right performance counter for the Intel PerfMon NMI watchdog

The Intel PerfMon NMI watchdog was using the generic reservation
function which always reserves the first performance counter. But the
watchdog actually uses the second performance counter, thus we need a
specialised function.

Signed-off-by: Bj?rn Steinbrink <[email protected]>
---
arch/i386/kernel/cpu/perfctr-watchdog.c | 22 ++++++++++++++++++++--
1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index a12dbcf..efc3232 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -562,9 +562,27 @@ static int setup_intel_arch_watchdog(unsigned nmi_hz)
return 1;
}

+static int intel_arch_reserve(void)
+{
+ if (!reserve_perfctr_nmi(MSR_ARCH_PERFMON_PERFCTR1))
+ return 0;
+
+ if (!reserve_evntsel_nmi(MSR_ARCH_PERFMON_EVENTSEL1)) {
+ release_perfctr_nmi(MSR_ARCH_PERFMON_PERFCTR1);
+ return 0;
+ }
+ return 1;
+}
+
+static void intel_arch_unreserve(void)
+{
+ release_evntsel_nmi(MSR_ARCH_PERFMON_EVENTSEL1);
+ release_perfctr_nmi(MSR_ARCH_PERFMON_PERFCTR1);
+}
+
static struct wd_ops intel_arch_wd_ops = {
- .reserve = single_msr_reserve,
- .unreserve = single_msr_unreserve,
+ .reserve = intel_arch_reserve,
+ .unreserve = intel_arch_unreserve,
.setup = setup_intel_arch_watchdog,
.rearm = p6_rearm,
.stop = single_msr_stop_watchdog,
--
1.5.2.2

2007-06-20 21:59:51

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

Bjorn,

I ran into one issue related with the new allocator.

In the case of a Core 2 Duo processor, the PMU implements more
than just basic counters. In particular it supports fixed counters
and PEBS where both use another set of MSRs. Those are not within
a 66 bit distance from MSR_ARCH_PERFMON_EVNTSEL0. Thus the allocator
fails with an assertion.

I do know that perfmon is the only consumer of those extended
features TODAY. Yet I think we need to define the allocator such
that it can work with other "distant" MSRs as well.

On Wed, Jun 20, 2007 at 08:33:15PM +0200, Bj?rn Steinbrink wrote:
> On 2007.06.20 15:01:02 +0200, Andi Kleen wrote:
> >
> > > Once this is fixed (which is what Bjorn did), then I will agree with you.
> > > For this, the allocator needs to be able to probe the CPU and initialize
> > > its own data structures.
> >
> > Ok that sounds reasonable. Please someone send a patch that does only
> > that.
>
> OK, here come the bugfixes without any restructuring. The first patch
> enables unconditional probing of the watchdog. The second makes the
> perfmon nmi watchdog reserve the correct perfctr/evntsel.
>
> Bj?rn
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> oprofile-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

--

-Stephane

2007-06-21 08:37:10

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

Bjorn,


On Wed, Jun 20, 2007 at 02:59:33PM -0700, Stephane Eranian wrote:
> Bjorn,
>
> I ran into one issue related with the new allocator.
>
> In the case of a Core 2 Duo processor, the PMU implements more
> than just basic counters. In particular it supports fixed counters
> and PEBS where both use another set of MSRs. Those are not within
> a 66 bit distance from MSR_ARCH_PERFMON_EVNTSEL0. Thus the allocator
> fails with an assertion.
>
> I do know that perfmon is the only consumer of those extended
> features TODAY. Yet I think we need to define the allocator such
> that it can work with other "distant" MSRs as well.
>

I think that a workaround for this issue could be for the allocator
to grant the requests for registers outside of the range, i.e., register
that it does not see/manage.

> On Wed, Jun 20, 2007 at 08:33:15PM +0200, Bj?rn Steinbrink wrote:
> > On 2007.06.20 15:01:02 +0200, Andi Kleen wrote:
> > >
> > > > Once this is fixed (which is what Bjorn did), then I will agree with you.
> > > > For this, the allocator needs to be able to probe the CPU and initialize
> > > > its own data structures.
> > >
> > > Ok that sounds reasonable. Please someone send a patch that does only
> > > that.
> >
> > OK, here come the bugfixes without any restructuring. The first patch
> > enables unconditional probing of the watchdog. The second makes the
> > perfmon nmi watchdog reserve the correct perfctr/evntsel.
> >
> > Bj?rn
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > oprofile-list mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/oprofile-list
>
> --
>
> -Stephane
> _______________________________________________
> perfmon mailing list
> [email protected]
> http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

--

-Stephane

2007-06-22 07:13:52

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

Hi Stephane,

On 2007.06.21 01:36:45 -0700, Stephane Eranian wrote:
> Bjorn,
>
>
> On Wed, Jun 20, 2007 at 02:59:33PM -0700, Stephane Eranian wrote:
> > Bjorn,
> >
> > I ran into one issue related with the new allocator.

Should be the same with 2.6.21 and earlier, the "new" allocator should
do exactly the samething, it just fixes the breakage introduced in the
post-2.6.21 cleanup.

> > In the case of a Core 2 Duo processor, the PMU implements more
> > than just basic counters. In particular it supports fixed counters
> > and PEBS where both use another set of MSRs. Those are not within
> > a 66 bit distance from MSR_ARCH_PERFMON_EVNTSEL0. Thus the allocator
> > fails with an assertion.

How far away are they?

> >
> > I do know that perfmon is the only consumer of those extended
> > features TODAY. Yet I think we need to define the allocator such
> > that it can work with other "distant" MSRs as well.
> >
>
> I think that a workaround for this issue could be for the allocator
> to grant the requests for registers outside of the range, i.e., register
> that it does not see/manage.

That would also allow multiple subsystems to use them at the same time.
And whoever adds the second user of those MSRs might not be aware of the
just-grant-it policy of the allocator. And bugs that arise due to such
problems will probably become a real PITA to track down.

Unfortunately, I don't see any elegant solution to this atm, and of
course making your code simply circumvent the allocator isn't an option
either.

Thanks,
Bj?rn

2007-06-22 10:05:51

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perfmon] Re: [PATCH 1/2] Separate the performance counter allocation from the LAPIC NMI watchdog

Bjorn,

You have the following registers to consider (for P4/Core):

#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_CORE_PERF_FIXED_CTR0 0x00000309
#define MSR_CORE_PERF_FIXED_CTR1 0x0000030a
#define MSR_CORE_PERF_FIXED_CTR2 0x0000030b
#define MSR_CORE_PERF_FIXED_CTR_CTRL 0x0000038d
#define MSR_CORE_PERF_GLOBAL_STATUS 0x0000038e
#define MSR_CORE_PERF_GLOBAL_CTRL 0x0000038f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x00000390


With Barcelona, you'll also have to consider:

#define MSR_AMD64_IBSFETCHCTL 0xc0011030
#define MSR_AMD64_IBSFETCHLINAD 0xc0011031
#define MSR_AMD64_IBSFETCHPHYSAD 0xc0011032
#define MSR_AMD64_IBSOPCTL 0xc0011033
#define MSR_AMD64_IBSOPRIP 0xc0011034
#define MSR_AMD64_IBSOPDATA 0xc0011035
#define MSR_AMD64_IBSOPDATA2 0xc0011036
#define MSR_AMD64_IBSOPDATA3 0xc0011037
#define MSR_AMD64_IBSDCLINAD 0xc0011038
#define MSR_AMD64_IBSDCPHYSAD 0xc0011039
#define MSR_AMD64_IBSCTL 0xc001103A

I think you could organize by groups with a bitmap
per group and some sorted linked list to keep track
of all of them.


On Fri, Jun 22, 2007 at 09:13:54AM +0200, Bj?rn Steinbrink wrote:
> Hi Stephane,
>
> On 2007.06.21 01:36:45 -0700, Stephane Eranian wrote:
> > Bjorn,
> >
> >
> > On Wed, Jun 20, 2007 at 02:59:33PM -0700, Stephane Eranian wrote:
> > > Bjorn,
> > >
> > > I ran into one issue related with the new allocator.
>
> Should be the same with 2.6.21 and earlier, the "new" allocator should
> do exactly the samething, it just fixes the breakage introduced in the
> post-2.6.21 cleanup.
>
> > > In the case of a Core 2 Duo processor, the PMU implements more
> > > than just basic counters. In particular it supports fixed counters
> > > and PEBS where both use another set of MSRs. Those are not within
> > > a 66 bit distance from MSR_ARCH_PERFMON_EVNTSEL0. Thus the allocator
> > > fails with an assertion.
>
> How far away are they?
>
> > >
> > > I do know that perfmon is the only consumer of those extended
> > > features TODAY. Yet I think we need to define the allocator such
> > > that it can work with other "distant" MSRs as well.
> > >
> >
> > I think that a workaround for this issue could be for the allocator
> > to grant the requests for registers outside of the range, i.e., register
> > that it does not see/manage.
>
> That would also allow multiple subsystems to use them at the same time.
> And whoever adds the second user of those MSRs might not be aware of the
> just-grant-it policy of the allocator. And bugs that arise due to such
> problems will probably become a real PITA to track down.
>
> Unfortunately, I don't see any elegant solution to this atm, and of
> course making your code simply circumvent the allocator isn't an option
> either.
>
> Thanks,
> Bj?rn
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> oprofile-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

--

-Stephane

2007-06-25 19:11:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On Wed, 20 Jun 2007 20:34:48 +0200
Bj__rn Steinbrink <[email protected]> wrote:

> The performance counter allocator relies on the nmi watchdog being
> probed, so we have to do that even if the watchdog is not enabled.
>

So... what's the status of this lot?

I've just merged this patch and the second one:

Subject: [PATCH 2/2] Reserve the right performance counter for the Intel PerfMon NMI watchdog
Message-ID: <[email protected]>

but there was no followup discussion afaict.

Andi, Stephane: acks?

If acked, do we agree that this is 2.6.22 material?

Thanks.

2007-06-25 19:38:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On Monday 25 June 2007 21:09, Andrew Morton wrote:
> On Wed, 20 Jun 2007 20:34:48 +0200
>
> Bj__rn Steinbrink <[email protected]> wrote:
> > The performance counter allocator relies on the nmi watchdog being
> > probed, so we have to do that even if the watchdog is not enabled.
>
> So... what's the status of this lot?
>
> I've just merged this patch and the second one:
>
> Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> PerfMon NMI watchdog Message-ID: <[email protected]>
>
> but there was no followup discussion afaict.
>
> Andi, Stephane: acks?

Yes, although I'm still a little uneasy about the always probe one.

> If acked, do we agree that this is 2.6.22 material?

The first (always probe) is probably .22 material, but needs more testing
first.

-Andi

2007-06-25 20:04:37

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

Hi,
On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote:
> On Monday 25 June 2007 21:09, Andrew Morton wrote:
> > On Wed, 20 Jun 2007 20:34:48 +0200
> >
> > Bj__rn Steinbrink <[email protected]> wrote:
> > > The performance counter allocator relies on the nmi watchdog being
> > > probed, so we have to do that even if the watchdog is not enabled.
> >
> > So... what's the status of this lot?
> >
> > I've just merged this patch and the second one:
> >
> > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> > PerfMon NMI watchdog Message-ID: <[email protected]>
> >
> > but there was no followup discussion afaict.
> >
> > Andi, Stephane: acks?
>
> Yes, although I'm still a little uneasy about the always probe one.
>

I looked at the code I have in my tree coming from Bjon's patches and
I am a bit confused by the flow for probing as well.

The register allocator works globally, i.e., you reserve a register
for all CPUs at once.

The probe_nmi_watchdog() routine simply probes the CPU type to initialize
the watchdog data structure (wd_ops). This needs to be done once and for all.
Why put it in a route that is called with on_each_cpu()?

I think the tricky part is that we do want to reserve perfctr1 even though
the NMI watchdog is not active. This comes from the fact that the NMI watchdog
knows about only one counter and if it can't get that one, it probably fails.
By reserving it from the start, we ensure NMI watchdog will work when eventually
activated.

Unlike sharing between Oprofile and perfmon which works by enforcing
mutual exclusion between the two subsystems, the NMI watchdog must work
concurrently with either Oprofile or Perfmon.

Bjorn, did I understand the constraints correctly?

--
-Stephane

2007-06-25 20:37:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog


> I looked at the code I have in my tree coming from Bjon's patches and
> I am a bit confused by the flow for probing as well.

Yes, it's a little risky. Perhaps it's better to readd the separate CPU switch
from .21 there again for 2.6.22. Ugly, but should be safe

-Andi

2007-06-25 21:03:49

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On 2007.06.25 13:01:58 -0700, Stephane Eranian wrote:
> Hi,
> On Mon, Jun 25, 2007 at 09:36:17PM +0200, Andi Kleen wrote:
> > On Monday 25 June 2007 21:09, Andrew Morton wrote:
> > > On Wed, 20 Jun 2007 20:34:48 +0200
> > >
> > > Bj__rn Steinbrink <[email protected]> wrote:
> > > > The performance counter allocator relies on the nmi watchdog being
> > > > probed, so we have to do that even if the watchdog is not enabled.
> > >
> > > So... what's the status of this lot?
> > >
> > > I've just merged this patch and the second one:
> > >
> > > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> > > PerfMon NMI watchdog Message-ID: <[email protected]>
> > >
> > > but there was no followup discussion afaict.
> > >
> > > Andi, Stephane: acks?
> >
> > Yes, although I'm still a little uneasy about the always probe one.
> >
>
> I looked at the code I have in my tree coming from Bjon's patches and
> I am a bit confused by the flow for probing as well.
>
> The register allocator works globally, i.e., you reserve a register
> for all CPUs at once.
>
> The probe_nmi_watchdog() routine simply probes the CPU type to initialize
> the watchdog data structure (wd_ops). This needs to be done once and for all.
> Why put it in a route that is called with on_each_cpu()?

Ehrm, that's a good question actually... I moved the probing call up
into setup_local_APIC in that other big patch and put a check at the
start of the probing function, so that it is executed only once. No idea
why I did it so weird in this one.

> I think the tricky part is that we do want to reserve perfctr1 even
> though the NMI watchdog is not active. This comes from the fact that
> the NMI watchdog knows about only one counter and if it can't get that
> one, it probably fails. By reserving it from the start, we ensure NMI
> watchdog will work when eventually activated.

Can you enable it later on at all? It failed for me when I tried,
because it didn't know which hardware to use. Had to pass the kernel
parameter to make the proc files do anything. Seems like it has to be
enable at boot to work at all.

And AFAICT we never unconditionally reserved a perfctr for the watchdog.

> Unlike sharing between Oprofile and perfmon which works by enforcing
> mutual exclusion between the two subsystems, the NMI watchdog must
> work concurrently with either Oprofile or Perfmon.

In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and
everything else had to deal with it. Since the cleanup, the watchdog
will release its perfctr when disabled, so another subsystem can grab
it. But that also means that that other subsystem must release it again
before you can reenable the watchdog.

> Bjorn, did I understand the constraints correctly?

I'll tell you, once I'm sure that I understood them correctly ;-)

Bj?rn

2007-06-25 21:06:15

by Björn Steinbrink

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On 2007.06.25 21:36:17 +0200, Andi Kleen wrote:
> On Monday 25 June 2007 21:09, Andrew Morton wrote:
> > On Wed, 20 Jun 2007 20:34:48 +0200
> >
> > Bj__rn Steinbrink <[email protected]> wrote:
> > > The performance counter allocator relies on the nmi watchdog being
> > > probed, so we have to do that even if the watchdog is not enabled.
> >
> > So... what's the status of this lot?
> >
> > I've just merged this patch and the second one:
> >
> > Subject: [PATCH 2/2] Reserve the right performance counter for the Intel
> > PerfMon NMI watchdog Message-ID: <[email protected]>
> >
> > but there was no followup discussion afaict.
> >
> > Andi, Stephane: acks?
>
> Yes, although I'm still a little uneasy about the always probe one.
>
> > If acked, do we agree that this is 2.6.22 material?
>
> The first (always probe) is probably .22 material, but needs more testing
> first.

Hm, without the second, I expect OProfile to break when the watchdog is
enabled. Alternatively to the patch I sent, we could revert the change
that makes it use perfctr1 instead of perfctr0. Would you prefer that?

Bj?rn

2007-06-26 08:27:51

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On Mon, 25 Jun 2007 23:04:25 +0200, [email protected] wrote:
> > I think the tricky part is that we do want to reserve perfctr1 even
> > though the NMI watchdog is not active. This comes from the fact that
> > the NMI watchdog knows about only one counter and if it can't get that
> > one, it probably fails. By reserving it from the start, we ensure NMI
> > watchdog will work when eventually activated.
>
> Can you enable it later on at all? It failed for me when I tried,
> because it didn't know which hardware to use. Had to pass the kernel
> parameter to make the proc files do anything. Seems like it has to be
> enable at boot to work at all.
>
> And AFAICT we never unconditionally reserved a perfctr for the watchdog.

Yes you can dynamically enable/disable the NMI watchdog,
at least if you booted with it enabled.

> In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and
> everything else had to deal with it. Since the cleanup, the watchdog
> will release its perfctr when disabled, so another subsystem can grab
> it. But that also means that that other subsystem must release it again
> before you can reenable the watchdog.

Which is the obvious and correct way to handle a shared resource.

Keeping parts of the PMU HW permanently reserved whether or not
the watchdog is enabled would be a BUG.

/Mikael

2007-06-26 09:58:20

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

Mikael,

On Tue, Jun 26, 2007 at 10:04:15AM +0200, Mikael Pettersson wrote:
> On Mon, 25 Jun 2007 23:04:25 +0200, [email protected] wrote:
> > > I think the tricky part is that we do want to reserve perfctr1 even
> > > though the NMI watchdog is not active. This comes from the fact that
> > > the NMI watchdog knows about only one counter and if it can't get that
> > > one, it probably fails. By reserving it from the start, we ensure NMI
> > > watchdog will work when eventually activated.
> >
> > Can you enable it later on at all? It failed for me when I tried,
> > because it didn't know which hardware to use. Had to pass the kernel
> > parameter to make the proc files do anything. Seems like it has to be
> > enable at boot to work at all.
> >
> > And AFAICT we never unconditionally reserved a perfctr for the watchdog.
>
> Yes you can dynamically enable/disable the NMI watchdog,
> at least if you booted with it enabled.
>
> > In 2.6.21 the nmi watchdog, if enabled, just reserved its perfctrs and
> > everything else had to deal with it. Since the cleanup, the watchdog
> > will release its perfctr when disabled, so another subsystem can grab
> > it. But that also means that that other subsystem must release it again
> > before you can reenable the watchdog.
>
> Which is the obvious and correct way to handle a shared resource.
>
Agreed.

> Keeping parts of the PMU HW permanently reserved whether or not
> the watchdog is enabled would be a BUG.
>
True. But the upside is that you guarantee the activation of the NMI
watchdog will always succeed which may be a valuable property given the
goal of the NMI watchdog. Otherwise, if Oprofile or perfmon are active,
the NMI will fail to grab a single counter.

--

-Stephane

2007-06-26 10:58:48

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH 1/2] Always probe the NMI watchdog

On Tue, 26 Jun 2007 02:57:55 -0700, Mikael Pettersson wrote:
> > Keeping parts of the PMU HW permanently reserved whether or not
> > the watchdog is enabled would be a BUG.
> >
> True. But the upside is that you guarantee the activation of the NMI
> watchdog will always succeed which may be a valuable property given the
> goal of the NMI watchdog. Otherwise, if Oprofile or perfmon are active,
> the NMI will fail to grab a single counter.

That relates more to policy than mechanism. I for one don't
consider the watchdog "sacred".

I think it's entirely reasonable that if the watchdog is disabled
(echo 0 > /proc/sys/kernel/nmi_watchdog) in order to give the
_full_ PMU HW to something else, then that something else needs
to be disabled (or otherwise instructed to give up the HW) before
the watchdog can be reenabled. If you don't want to risk not
being able to restart the watchdog, don't disable it.