2008-12-02 06:21:49

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 716d26f..e9f80c7 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -156,6 +156,8 @@ static void ppro_start(struct op_msrs const * const msrs)
unsigned int low, high;
int i;

+ if (!reset_value)
+ return;
for (i = 0; i < num_counters; ++i) {
if (reset_value[i]) {
CTRL_READ(low, high, msrs, i);
@@ -171,6 +173,8 @@ static void ppro_stop(struct op_msrs const * const msrs)
unsigned int low, high;
int i;

+ if (!reset_value)
+ return;
for (i = 0; i < num_counters; ++i) {
if (!reset_value[i])
continue;


Attachments:
ppro_stop.patch (660.00 B)

2008-12-02 08:17:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()


* Eric Dumazet <[email protected]> wrote:

> If oprofile statically compiled in kernel, a cpu unplug triggers
> a panic in ppro_stop(), because a NULL pointer is dereferenced.
>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> arch/x86/oprofile/op_model_ppro.c | 4 ++++
> 1 files changed, 4 insertions(+)

> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
> index 716d26f..e9f80c7 100644
> --- a/arch/x86/oprofile/op_model_ppro.c
> +++ b/arch/x86/oprofile/op_model_ppro.c
> @@ -156,6 +156,8 @@ static void ppro_start(struct op_msrs const * const msrs)
> unsigned int low, high;
> int i;
>
> + if (!reset_value)
> + return;
>
> for (i = 0; i < num_counters; ++i) {
> if (reset_value[i]) {
> CTRL_READ(low, high, msrs, i);

i checked which commit caused this, and it is:

From b99170288421c79f0c2efa8b33e26e65f4bb7fb8 Mon Sep 17 00:00:00 2001
From: Andi Kleen <[email protected]>
Date: Mon, 18 Aug 2008 14:50:31 +0200
Subject: [PATCH] oprofile: Implement Intel architectural perfmon support

it is an absolutely horrible commit - which has caused the second
regression in a row already. The _real_ "perfmon support" patch should
have been a _oneliner_:

-#define NUM_COUNTERS 2
-#define NUM_CONTROLS 2
+#define NUM_COUNTERS 8
+#define NUM_CONTROLS 8

as Nehalem has 4 performance counters so 8 is plenty - and we dont expect
more than 8 in the next 5 years or so.

It was absolutely unnecessary to add kmalloc to this rarely executed
codepath - and the way it was added was absolutely horrible as well, it
was tacked on in the middle of an existing codepath, instead of factoring
it out nicely. Perfmon will eventually replace PMC management anyway, so
there was no "this way it's cleaner" argument either. So this code should
have been changed minimally, instead of slapping in a full kmalloc for a
simple array extension from 2 to 4 entries ...

You need to be more careful when changing x86 architecture code.

Ingo

Subject: Re: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()

On 02.12.08 09:17:29, Ingo Molnar wrote:
>
> * Eric Dumazet <[email protected]> wrote:
>
> > If oprofile statically compiled in kernel, a cpu unplug triggers
> > a panic in ppro_stop(), because a NULL pointer is dereferenced.
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> > ---
> > arch/x86/oprofile/op_model_ppro.c | 4 ++++
> > 1 files changed, 4 insertions(+)
>
> > diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
> > index 716d26f..e9f80c7 100644
> > --- a/arch/x86/oprofile/op_model_ppro.c
> > +++ b/arch/x86/oprofile/op_model_ppro.c
> > @@ -156,6 +156,8 @@ static void ppro_start(struct op_msrs const * const msrs)
> > unsigned int low, high;
> > int i;
> >
> > + if (!reset_value)
> > + return;
> >
> > for (i = 0; i < num_counters; ++i) {
> > if (reset_value[i]) {
> > CTRL_READ(low, high, msrs, i);

The patch fixes the null pointer access and this ok. But the root
cause seems to be in the cpu hotplug and initialization
code. xxx_start() should not be called before xxx_setup_ctrs() or
after xxx_shutdown(). Also, running only xxx_start() and xxx_stop() in
the cpu notifier functions is not sufficient. There is at least some
on_each_cpu code in nmi_setup() that should be called also in the cpu
notifier functions. I have to review that code.

[...]

> It was absolutely unnecessary to add kmalloc to this rarely executed
> codepath - and the way it was added was absolutely horrible as well, it
> was tacked on in the middle of an existing codepath, instead of factoring
> it out nicely. Perfmon will eventually replace PMC management anyway, so
> there was no "this way it's cleaner" argument either. So this code should
> have been changed minimally, instead of slapping in a full kmalloc for a
> simple array extension from 2 to 4 entries ...

Ingo, you are right that using kmalloc is unnecessary for
reset_value. So, Andi, maybe you could make this code easier?

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2008-12-03 14:08:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()

Robert Richter wrote:
> On 02.12.08 09:17:29, Ingo Molnar wrote:
>> * Eric Dumazet <[email protected]> wrote:
>>
>>> If oprofile statically compiled in kernel, a cpu unplug triggers
>>> a panic in ppro_stop(), because a NULL pointer is dereferenced.
>>>
>>> Signed-off-by: Eric Dumazet <[email protected]>
>>> ---
>>> arch/x86/oprofile/op_model_ppro.c | 4 ++++
>>> 1 files changed, 4 insertions(+)
>>> diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
>>> index 716d26f..e9f80c7 100644
>>> --- a/arch/x86/oprofile/op_model_ppro.c
>>> +++ b/arch/x86/oprofile/op_model_ppro.c
>>> @@ -156,6 +156,8 @@ static void ppro_start(struct op_msrs const * const msrs)
>>> unsigned int low, high;
>>> int i;
>>>
>>> + if (!reset_value)
>>> + return;
>>>
>>> for (i = 0; i < num_counters; ++i) {
>>> if (reset_value[i]) {
>>> CTRL_READ(low, high, msrs, i);
>
> The patch fixes the null pointer access and this ok. But the root
> cause seems to be in the cpu hotplug and initialization
> code. xxx_start() should not be called before xxx_setup_ctrs() or
> after xxx_shutdown().

Yes, it would be better to fix that. At least it would make
the code cleaner than the add checks for this backdoor everywhere.

> Also, running only xxx_start() and xxx_stop() in
> the cpu notifier functions is not sufficient. There is at least some
> on_each_cpu code in nmi_setup() that should be called also in the cpu
> notifier functions. I have to review that code.

AFAIK cpu hotplug has more problems in oprofile anyways. That is why
I didn't test that case.

>
> [...]
>
>> It was absolutely unnecessary to add kmalloc to this rarely executed
>> codepath - and the way it was added was absolutely horrible as well, it
>> was tacked on in the middle of an existing codepath, instead of factoring
>> it out nicely. Perfmon will eventually replace PMC management anyway, so
>> there was no "this way it's cleaner" argument either. So this code should
>> have been changed minimally, instead of slapping in a full kmalloc for a
>> simple array extension from 2 to 4 entries ...
>
> Ingo, you are right that using kmalloc is unnecessary for
> reset_value. So, Andi, maybe you could make this code easier?

The reason I added the kmalloc is that there's also a varying number
of separate fixed function counters (although that's not currently
submitted).

Also I would prefer to not have a hard coded number for future
CPUs. Contrary to other people's opinion architectural perfmon is
not for Nehalem only.

-Andi

Subject: Re: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()

On 02.12.08 07:21:21, Eric Dumazet wrote:
> If oprofile statically compiled in kernel, a cpu unplug triggers
> a panic in ppro_stop(), because a NULL pointer is dereferenced.
>
> Signed-off-by: Eric Dumazet <[email protected]>

Eric, I applied your patch and it will go upstream for 2.6.28.

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]