LinuxLists.cc - BUG: cpufreq notification broken

2006-11-16 19:58:13

Subject: BUG: cpufreq notification broken

[PATCH] cpufreq: make the transition_notifier chain use SRCU
(b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9)

breaks cpu frequency notification users, which register the callback on
core_init level. Interestingly enough the registration survives the
uninitialized head, but the registered user is lost by:

static int __init init_cpufreq_transition_notifier_list(void)
{
srcu_init_notifier_head(&cpufreq_transition_notifier_list);
return 0;
}
core_initcall(init_cpufreq_transition_notifier_list);

This affects i386, x86_64 and sparc64 AFAICT, which call
register_notifier early in the arch code.

> The head of the notifier chain needs to be initialized before use;
> this is done by an __init routine at core_initcall time. If this turns
> out not to be a good choice, it can easily be changed.

Hmm, there are no static initializers for srcu and the only way to fix
this up is to move the arch calls to postcore_init.

tglx

2006-11-16 20:16:36

by Ingo Molnar

[permalink] [raw]

Subject: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

* Thomas Gleixner <[email protected]> wrote:

> [PATCH] cpufreq: make the transition_notifier chain use SRCU
> (b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9)
>
> breaks cpu frequency notification users, which register the callback
> on core_init level. Interestingly enough the registration survives the
> uninitialized head, but the registered user is lost by:

i have hit this bug in -rt (it caused a lockup) and have fixed it -
forgot to send it upstream. Find the patch below.

Ingo

---------------->
From: Ingo Molnar <[email protected]>
Subject: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

init_cpufreq_transition_notifier_list() should execute first, which is a
core_initcall, so mark cpufreq_tsc() core_initcall_sync.

Signed-off-by: Ingo Molnar <[email protected]>

--- linux.orig/arch/x86_64/kernel/tsc.c
+++ linux/arch/x86_64/kernel/tsc.c
@@ -138,7 +138,11 @@ static int __init cpufreq_tsc(void)
return 0;
}

-core_initcall(cpufreq_tsc);
+/*
+ * init_cpufreq_transition_notifier_list() should execute first,
+ * which is a core_initcall, so mark this one core_initcall_sync:
+ */
+core_initcall_sync(cpufreq_tsc);

#endif
/*

2006-11-16 20:27:25

by Alan Stern

[permalink] [raw]

Subject: Re: BUG: cpufreq notification broken

On Thu, 16 Nov 2006, Thomas Gleixner wrote:

> [PATCH] cpufreq: make the transition_notifier chain use SRCU
> (b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9)
>
> breaks cpu frequency notification users, which register the callback on
> core_init level. Interestingly enough the registration survives the
> uninitialized head, but the registered user is lost by:
>
> static int __init init_cpufreq_transition_notifier_list(void)
> {
> srcu_init_notifier_head(&cpufreq_transition_notifier_list);
> return 0;
> }
> core_initcall(init_cpufreq_transition_notifier_list);
>
> This affects i386, x86_64 and sparc64 AFAICT, which call
> register_notifier early in the arch code.
>
> > The head of the notifier chain needs to be initialized before use;
> > this is done by an __init routine at core_initcall time. If this turns
> > out not to be a good choice, it can easily be changed.
>
> Hmm, there are no static initializers for srcu and the only way to fix
> this up is to move the arch calls to postcore_init.

If you can find a way to invoke init_cpufreq_transition_notifier_list
earlier than core_initcall time, that would be okay. I did it this way
because it was easiest, but earlier should be just as good.

The only requirement is that alloc_percpu() has to be working, so that the
SRCU per-cpu data values can be set up. I don't know how early in the
boot process you can do per-cpu memory allocation.

As an alternative approach, initialization of srcu_notifiers could be
broken up into two pieces, one of which could be done statically. The
part that has to be done dynamically (the SRCU initialization) wouldn't
mess up the notifier chain. Provided the dynamic part is carried out
while the system is still single-threaded, it would be safe.

Alan Stern

2006-11-16 20:42:53

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 2006-11-16 at 21:15 +0100, Ingo Molnar wrote:
> From: Ingo Molnar <[email protected]>
> Subject: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
>
> init_cpufreq_transition_notifier_list() should execute first, which is a
> core_initcall, so mark cpufreq_tsc() core_initcall_sync.
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> --- linux.orig/arch/x86_64/kernel/tsc.c
> +++ linux/arch/x86_64/kernel/tsc.c
> @@ -138,7 +138,11 @@ static int __init cpufreq_tsc(void)
> return 0;
> }

Here is the i386/sparc fixup

--------------->

From: Thomas Gleixner <[email protected]>
Subject: [patch] cpufreq: register notifiers after the head init

init_cpufreq_transition_notifier_list() must execute first, which is a
core_initcall, so move the registration to core_initcall_sync.

Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index fbc9582..4ebd903 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -315,7 +315,7 @@ static int __init cpufreq_tsc(void)
return ret;
}

-core_initcall(cpufreq_tsc);
+core_initcall_sync(cpufreq_tsc);

#endif

diff --git a/arch/sparc64/kernel/time.c b/arch/sparc64/kernel/time.c
index 061e1b1..c6eadcc 100644
--- a/arch/sparc64/kernel/time.c
+++ b/arch/sparc64/kernel/time.c
@@ -973,6 +973,13 @@ static struct notifier_block sparc64_cpu
.notifier_call = sparc64_cpufreq_notifier
};

+static int __init sparc64_cpu_freq_init(void)
+{
+ return cpufreq_register_notifier(&sparc64_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+}
+core_initcall_sync(sparc64_cpu_freq_init);
+
#endif /* CONFIG_CPU_FREQ */

static struct time_interpolator sparc64_cpu_interpolator = {
@@ -999,10 +1006,6 @@ void __init time_init(void)
(((NSEC_PER_SEC << SPARC64_NSEC_PER_CYC_SHIFT) +
(clock / 2)) / clock);

-#ifdef CONFIG_CPU_FREQ
- cpufreq_register_notifier(&sparc64_cpufreq_notifier_block,
- CPUFREQ_TRANSITION_NOTIFIER);
-#endif
}

unsigned long long sched_clock(void)

2006-11-16 20:52:18

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

2006-11-16 21:06:50

by Thomas Gleixner

[permalink] [raw]

Subject: Re: BUG: cpufreq notification broken

On Thu, 2006-11-16 at 15:27 -0500, Alan Stern wrote:
> > Hmm, there are no static initializers for srcu and the only way to fix
> > this up is to move the arch calls to postcore_init.
>
> If you can find a way to invoke init_cpufreq_transition_notifier_list
> earlier than core_initcall time, that would be okay. I did it this way
> because it was easiest, but earlier should be just as good.
>
> The only requirement is that alloc_percpu() has to be working, so that the
> SRCU per-cpu data values can be set up. I don't know how early in the
> boot process you can do per-cpu memory allocation.
>
> As an alternative approach, initialization of srcu_notifiers could be
> broken up into two pieces, one of which could be done statically. The
> part that has to be done dynamically (the SRCU initialization) wouldn't
> mess up the notifier chain. Provided the dynamic part is carried out
> while the system is still single-threaded, it would be safe.

There is another issue with this SRCU change:

The notification comes actually after the real change, which is bad. We
try to make the TSC usable by backing it with pm_timer accross such
states, but this behaviour breaks the safety code.

tglx

2006-11-16 21:24:41

by Andrew Morton

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 16 Nov 2006 21:15:32 +0100
Ingo Molnar <[email protected]> wrote:

>
> * Thomas Gleixner <[email protected]> wrote:
>
> > [PATCH] cpufreq: make the transition_notifier chain use SRCU
> > (b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9)
> >
> > breaks cpu frequency notification users, which register the callback
> > on core_init level. Interestingly enough the registration survives the
> > uninitialized head, but the registered user is lost by:
>
> i have hit this bug in -rt (it caused a lockup) and have fixed it -
> forgot to send it upstream. Find the patch below.
>
> Ingo
>
> ---------------->
> From: Ingo Molnar <[email protected]>
> Subject: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
>
> init_cpufreq_transition_notifier_list() should execute first, which is a
> core_initcall, so mark cpufreq_tsc() core_initcall_sync.

That's not a terribly useful changelog. What bug is being fixed. What
does "first" mean?

> Signed-off-by: Ingo Molnar <[email protected]>
>
> --- linux.orig/arch/x86_64/kernel/tsc.c
> +++ linux/arch/x86_64/kernel/tsc.c
> @@ -138,7 +138,11 @@ static int __init cpufreq_tsc(void)
> return 0;
> }
>
> -core_initcall(cpufreq_tsc);
> +/*
> + * init_cpufreq_transition_notifier_list() should execute first,
> + * which is a core_initcall, so mark this one core_initcall_sync:
> + */
> +core_initcall_sync(cpufreq_tsc);

Would prefer that we not use the _sync levels. They're there as a
synchronisation for MULTITHREAD_PROBE and might disappear at any time.

2006-11-16 21:24:55

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 2006-11-16 at 13:20 -0800, Andrew Morton wrote:
> >
> > -core_initcall(cpufreq_tsc);
> > +/*
> > + * init_cpufreq_transition_notifier_list() should execute first,
> > + * which is a core_initcall, so mark this one core_initcall_sync:
> > + */
> > +core_initcall_sync(cpufreq_tsc);
>
> Would prefer that we not use the _sync levels. They're there as a
> synchronisation for MULTITHREAD_PROBE and might disappear at any time.

Works also fine with postcore_init, but I'm more concerned about the
delayed notification (after the actual change) which is introduced by
this srcu change.

tglx

2006-11-16 21:26:54

by Alan Stern

[permalink] [raw]

Subject: Re: BUG: cpufreq notification broken

On Thu, 16 Nov 2006, Thomas Gleixner wrote:

> There is another issue with this SRCU change:
>
> The notification comes actually after the real change, which is bad. We
> try to make the TSC usable by backing it with pm_timer accross such
> states, but this behaviour breaks the safety code.

I don't understand. Sending notifications is completely separate from
setting up the notifier chain's head. The patch you mentioned didn't
touch the code that sends the notifications.

Alan Stern

2006-11-16 21:33:55

by Thomas Gleixner

[permalink] [raw]

Subject: Re: BUG: cpufreq notification broken

On Thu, 2006-11-16 at 16:26 -0500, Alan Stern wrote:
> On Thu, 16 Nov 2006, Thomas Gleixner wrote:
>
> > There is another issue with this SRCU change:
> >
> > The notification comes actually after the real change, which is bad. We
> > try to make the TSC usable by backing it with pm_timer accross such
> > states, but this behaviour breaks the safety code.
>
> I don't understand. Sending notifications is completely separate from
> setting up the notifier chain's head. The patch you mentioned didn't
> touch the code that sends the notifications.

Yeah, my bad. It just uses rcu based locking, but its still synchronous.

I have to dig deeper, why the change of the frequency happens _before_
the notifier arrives.

tglx

2006-11-16 21:48:19

by Linus Torvalds

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 16 Nov 2006, Thomas Gleixner wrote:
>
> Here is the i386/sparc fixup

Gag me with a volvo.

This is disgusting, but I would actually prefer the following version over
the patches I've seen, because

- it doesn't end up having any architecture-specific parts

- it doesn't use the new "xxx_sync()" thing that I'm not even sure we
should be using.

- it makes it clear that this should be fixed, preferably by just having
some way to initialize SRCU structs staticalyl. If we get that, the fix
is to just replace the horrible "initialize by hand" with a static
initializer once and for all.

Hmm?

Totally untested, but it compiles and it _looks_ sane. The overhead of the
function call should be minimal, once things are initialized.

Paul, it would be _really_ nice to have some way to just initialize that
SRCU thing statically. This kind of crud is just crazy.

Comments?

Linus

----
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 86e69b7..02326b2 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -52,14 +52,39 @@ static void handle_update(void *data);
* The mutex locks both lists.
*/
static BLOCKING_NOTIFIER_HEAD(cpufreq_policy_notifier_list);
-static struct srcu_notifier_head cpufreq_transition_notifier_list;

-static int __init init_cpufreq_transition_notifier_list(void)
+/*
+ * This is horribly horribly ugly.
+ *
+ * We really want to initialize the transition notifier list
+ * statically and just once, but there is no static way to
+ * initialize a srcu lock, so we instead make up all this nasty
+ * infrastructure to make sure it's initialized when we use it.
+ *
+ * Bleaargh.
+ */
+static struct srcu_notifier_head *cpufreq_transition_notifier_list(void)
{
- srcu_init_notifier_head(&cpufreq_transition_notifier_list);
- return 0;
+ static struct srcu_notifier_head *initialized;
+ struct srcu_notifier_head *ret;
+
+ ret = initialized;
+ if (!ret) {
+ static DEFINE_MUTEX(init_lock);
+
+ mutex_lock(&init_lock);
+ ret = initialized;
+ if (!ret) {
+ static struct srcu_notifier_head list_head;
+ ret = &list_head;
+ srcu_init_notifier_head(ret);
+ smp_wmb();
+ initialized = ret;
+ }
+ mutex_unlock(&init_lock);
+ }
+ return ret;
}
-core_initcall(init_cpufreq_transition_notifier_list);

static LIST_HEAD(cpufreq_governor_list);
static DEFINE_MUTEX (cpufreq_governor_mutex);
@@ -268,14 +293,14 @@ void cpufreq_notify_transition(struct cp
freqs->old = policy->cur;
}
}
- srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
+ srcu_notifier_call_chain(cpufreq_transition_notifier_list(),
CPUFREQ_PRECHANGE, freqs);
adjust_jiffies(CPUFREQ_PRECHANGE, freqs);
break;

case CPUFREQ_POSTCHANGE:
adjust_jiffies(CPUFREQ_POSTCHANGE, freqs);
- srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
+ srcu_notifier_call_chain(cpufreq_transition_notifier_list(),
CPUFREQ_POSTCHANGE, freqs);
if (likely(policy) && likely(policy->cpu == freqs->cpu))
policy->cur = freqs->new;
@@ -1055,7 +1080,7 @@ static int cpufreq_suspend(struct sys_de
freqs.old = cpu_policy->cur;
freqs.new = cur_freq;

- srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
+ srcu_notifier_call_chain(cpufreq_transition_notifier_list(),
CPUFREQ_SUSPENDCHANGE, &freqs);
adjust_jiffies(CPUFREQ_SUSPENDCHANGE, &freqs);

@@ -1137,7 +1162,7 @@ static int cpufreq_resume(struct sys_dev
freqs.new = cur_freq;

srcu_notifier_call_chain(
- &cpufreq_transition_notifier_list,
+ cpufreq_transition_notifier_list(),
CPUFREQ_RESUMECHANGE, &freqs);
adjust_jiffies(CPUFREQ_RESUMECHANGE, &freqs);

@@ -1183,7 +1208,7 @@ int cpufreq_register_notifier(struct not
switch (list) {
case CPUFREQ_TRANSITION_NOTIFIER:
ret = srcu_notifier_chain_register(
- &cpufreq_transition_notifier_list, nb);
+ cpufreq_transition_notifier_list(), nb);
break;
case CPUFREQ_POLICY_NOTIFIER:
ret = blocking_notifier_chain_register(
@@ -1215,7 +1240,7 @@ int cpufreq_unregister_notifier(struct n
switch (list) {
case CPUFREQ_TRANSITION_NOTIFIER:
ret = srcu_notifier_chain_unregister(
- &cpufreq_transition_notifier_list, nb);
+ cpufreq_transition_notifier_list(), nb);
break;
case CPUFREQ_POLICY_NOTIFIER:
ret = blocking_notifier_chain_unregister(

2006-11-16 21:56:37

by Alan Stern

[permalink] [raw]

Subject: Re: BUG: cpufreq notification broken

On Thu, 16 Nov 2006, Thomas Gleixner wrote:

> On Thu, 2006-11-16 at 16:26 -0500, Alan Stern wrote:
> > On Thu, 16 Nov 2006, Thomas Gleixner wrote:
> >
> > > There is another issue with this SRCU change:
> > >
> > > The notification comes actually after the real change, which is bad. We
> > > try to make the TSC usable by backing it with pm_timer accross such
> > > states, but this behaviour breaks the safety code.
> >
> > I don't understand. Sending notifications is completely separate from
> > setting up the notifier chain's head. The patch you mentioned didn't
> > touch the code that sends the notifications.
>
> Yeah, my bad. It just uses rcu based locking, but its still synchronous.
>
> I have to dig deeper, why the change of the frequency happens _before_
> the notifier arrives.

There are supposed to be _two_ notifier calls: one before the frequency
change and one after. Check the callers of cpufreq_notify_transition().

Alan Stern

2006-11-16 22:03:52

by Alan Stern

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 16 Nov 2006, Linus Torvalds wrote:

> - it makes it clear that this should be fixed, preferably by just having
> some way to initialize SRCU structs staticalyl. If we get that, the fix
> is to just replace the horrible "initialize by hand" with a static
> initializer once and for all.
>
> Hmm?
>
> Totally untested, but it compiles and it _looks_ sane. The overhead of the
> function call should be minimal, once things are initialized.
>
> Paul, it would be _really_ nice to have some way to just initialize that
> SRCU thing statically. This kind of crud is just crazy.

I looked into this back when SRCU was first added. It's essentially
impossible to do it, because the per-cpu memory allocation & usage APIs
are completely different for the static and the dynamic cases. They are a
real mess. I couldn't think up a way to construct any sort of uniform
interface to per-cpu memory, not without completely changing the guts of
the per-cpu stuff.

If you or someone else can fix that problem, I will be happy to change the
SRCU-based notifiers to work both ways.

Alan Stern

2006-11-16 22:22:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync

On Thu, 16 Nov 2006, Alan Stern wrote:
> On Thu, 16 Nov 2006, Linus Torvalds wrote:
> >
> > Paul, it would be _really_ nice to have some way to just initialize
> > that SRCU thing statically. This kind of crud is just crazy.
>
> I looked into this back when SRCU was first added. It's essentially
> impossible to do it, because the per-cpu memory allocation & usage APIs
> are completely different for the static and the dynamic cases.

I don't think that's how you'd want to do it.

There's no way to do an initialization of a percpu allocation statically.
That's pretty obvious.

What I'd suggest instead, is to make the allocation dynamic, and make it
inside the srcu functions (kind of like I did now, but I did it at a
higher level).

Doing it at the high level was trivial right now, but we may well end up
hitting this problem again if people start using SRCU more. Right now I
suspect the cpufreq notifier is the only thing that uses SRCU, and it
already showed this problem with SRCU initializers.

So I was more thinking about moving my "one special case high level hack"
down lower, down to the SRCU level, so that we'll never see _more_ of
those horrible hacks. We'll still have the hacky thing, but at least it
will be limited to a single place - the SRCU code itself.

Linus

2006-11-17 02:32:09