2013-09-09 15:12:06

by Guennadi Liakhovetski

[permalink] [raw]
Subject: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

Sorry guys, I'm trying my best to stop this patch from propagating to
stable and to get it fixed asap, so, the CC list might be a bit excessive.
Also trying to fix the originally spare cc list, which makes it impossible
for me to reply to the original thread, instead have to start a new one.

Commit

commit dceff5ce18801dddc220d6238628619c93bc3cb6
Author: Viresh Kumar <[email protected]>
Date: Sun Sep 1 22:19:37 2013 +0530

cpufreq: fix serialization issues with freq change notifiers

breaks .transition_ongoing counting. This leads to cpufreq-cpu0 not
working any more. In particular switching the governor from performance to
powersave directly after boot doesn't result in a frequency switch any
more. Reverting this patch fixes the problem again. Tested with today's
-next.

Please, refrain from including into "stable" until clarified!

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/


2013-09-09 20:57:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

Hi,

On Monday, September 09, 2013 05:11:10 PM Guennadi Liakhovetski wrote:
> Sorry guys, I'm trying my best to stop this patch from propagating to
> stable and to get it fixed asap, so, the CC list might be a bit excessive.
> Also trying to fix the originally spare cc list, which makes it impossible
> for me to reply to the original thread, instead have to start a new one.

I'm not sure what you're talking about. What exactly was wrong with the
original CC list in particular?

> Commit
>
> commit dceff5ce18801dddc220d6238628619c93bc3cb6
> Author: Viresh Kumar <[email protected]>
> Date: Sun Sep 1 22:19:37 2013 +0530
>
> cpufreq: fix serialization issues with freq change notifiers
>
> breaks .transition_ongoing counting.

Do you know how exactly it breaks that? If so, care to share that knowledge?

> This leads to cpufreq-cpu0 not working any more. In particular switching the
> governor from performance to powersave directly after boot doesn't result in
> a frequency switch any more. Reverting this patch fixes the problem again.

However, this is a regression fix, so I'd prefer to fix the problem on top of
it instead of reverting this commit entirely.

> Tested with today's
> -next.
>
> Please, refrain from including into "stable" until clarified!

Well, dropping the commit altogether and dropping the "CC stable" tag are
equally disruptive at this point, so I think I'll just defer all of the
cpufreq fixes I wanted to push for 3.12 before the ending of the merge
window.

Thanks,
Rafael

2013-09-09 21:43:17

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

Hi Rafael

On Mon, 9 Sep 2013, Rafael J. Wysocki wrote:

> Hi,
>
> On Monday, September 09, 2013 05:11:10 PM Guennadi Liakhovetski wrote:
> > Sorry guys, I'm trying my best to stop this patch from propagating to
> > stable and to get it fixed asap, so, the CC list might be a bit excessive.
> > Also trying to fix the originally spare cc list, which makes it impossible
> > for me to reply to the original thread, instead have to start a new one.
>
> I'm not sure what you're talking about. What exactly was wrong with the
> original CC list in particular?

I think you advised once to cc cpufreq related mails to linux-pm too at
least. I haven't found this patch in my pm archive, have I missed it
there?

> > Commit
> >
> > commit dceff5ce18801dddc220d6238628619c93bc3cb6
> > Author: Viresh Kumar <[email protected]>
> > Date: Sun Sep 1 22:19:37 2013 +0530
> >
> > cpufreq: fix serialization issues with freq change notifiers
> >
> > breaks .transition_ongoing counting.
>
> Do you know how exactly it breaks that? If so, care to share that knowledge?

No, I don't. I only know that in __cpufreq_driver_target() the check for

if (policy->transition_ongoing) {
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
return -EBUSY;
}

is failing with this patch and cpufreq-cpu0.

> > This leads to cpufreq-cpu0 not working any more. In particular switching the
> > governor from performance to powersave directly after boot doesn't result in
> > a frequency switch any more. Reverting this patch fixes the problem again.
>
> However, this is a regression fix, so I'd prefer to fix the problem on top of
> it instead of reverting this commit entirely.

If I understood correctly, this patch fixed some warnings, that, however,
didn't disrupt functionality, is this right? Whereas the patch really
seems to break working set ups.

Anyway, we know about the problem now and I believe it'll get fixed one
way or another.

Thanks
Guennadi

> > Tested with today's
> > -next.
> >
> > Please, refrain from including into "stable" until clarified!
>
> Well, dropping the commit altogether and dropping the "CC stable" tag are
> equally disruptive at this point, so I think I'll just defer all of the
> cpufreq fixes I wanted to push for 3.12 before the ending of the merge
> window.
>
> Thanks,
> Rafael
>

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

2013-09-09 23:01:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Monday, September 09, 2013 11:42:41 PM Guennadi Liakhovetski wrote:
> Hi Rafael
>
> On Mon, 9 Sep 2013, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > On Monday, September 09, 2013 05:11:10 PM Guennadi Liakhovetski wrote:
> > > Sorry guys, I'm trying my best to stop this patch from propagating to
> > > stable and to get it fixed asap, so, the CC list might be a bit excessive.
> > > Also trying to fix the originally spare cc list, which makes it impossible
> > > for me to reply to the original thread, instead have to start a new one.
> >
> > I'm not sure what you're talking about. What exactly was wrong with the
> > original CC list in particular?
>
> I think you advised once to cc cpufreq related mails to linux-pm too at
> least.

Yes, I did.

> I haven't found this patch in my pm archive, have I missed it there?

Quite frankly, I don't remember if it was there. ISTR having it it patchwork,
which would mean that it was there, but well.

> > > Commit
> > >
> > > commit dceff5ce18801dddc220d6238628619c93bc3cb6
> > > Author: Viresh Kumar <[email protected]>
> > > Date: Sun Sep 1 22:19:37 2013 +0530
> > >
> > > cpufreq: fix serialization issues with freq change notifiers
> > >
> > > breaks .transition_ongoing counting.
> >
> > Do you know how exactly it breaks that? If so, care to share that knowledge?
>
> No, I don't. I only know that in __cpufreq_driver_target() the check for
>
> if (policy->transition_ongoing) {
> write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> return -EBUSY;
> }
>
> is failing with this patch and cpufreq-cpu0.

OK, we need to figure out that, then.

> > > This leads to cpufreq-cpu0 not working any more. In particular switching the
> > > governor from performance to powersave directly after boot doesn't result in
> > > a frequency switch any more. Reverting this patch fixes the problem again.
> >
> > However, this is a regression fix, so I'd prefer to fix the problem on top of
> > it instead of reverting this commit entirely.
>
> If I understood correctly, this patch fixed some warnings, that, however,
> didn't disrupt functionality, is this right? Whereas the patch really
> seems to break working set ups.

It fixed warnings that indicated problems and those problems should rather be
avoided.

Thanks,
Rafael

2013-09-10 01:35:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tuesday, September 10, 2013 01:12:49 AM Rafael J. Wysocki wrote:
> On Monday, September 09, 2013 11:42:41 PM Guennadi Liakhovetski wrote:
> > Hi Rafael
> >
> > On Mon, 9 Sep 2013, Rafael J. Wysocki wrote:
> >
> > > Hi,
> > >
> > > On Monday, September 09, 2013 05:11:10 PM Guennadi Liakhovetski wrote:
> > > > Sorry guys, I'm trying my best to stop this patch from propagating to
> > > > stable and to get it fixed asap, so, the CC list might be a bit excessive.
> > > > Also trying to fix the originally spare cc list, which makes it impossible
> > > > for me to reply to the original thread, instead have to start a new one.
> > >
> > > I'm not sure what you're talking about. What exactly was wrong with the
> > > original CC list in particular?
> >
> > I think you advised once to cc cpufreq related mails to linux-pm too at
> > least.
>
> Yes, I did.
>
> > I haven't found this patch in my pm archive, have I missed it there?
>
> Quite frankly, I don't remember if it was there. ISTR having it it patchwork,
> which would mean that it was there, but well.
>
> > > > Commit
> > > >
> > > > commit dceff5ce18801dddc220d6238628619c93bc3cb6
> > > > Author: Viresh Kumar <[email protected]>
> > > > Date: Sun Sep 1 22:19:37 2013 +0530
> > > >
> > > > cpufreq: fix serialization issues with freq change notifiers
> > > >
> > > > breaks .transition_ongoing counting.
> > >
> > > Do you know how exactly it breaks that? If so, care to share that knowledge?
> >
> > No, I don't. I only know that in __cpufreq_driver_target() the check for
> >
> > if (policy->transition_ongoing) {
> > write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> > return -EBUSY;
> > }
> >
> > is failing with this patch and cpufreq-cpu0.
>
> OK, we need to figure out that, then.

But given the timing I think I'll just start to revert things and we can add
them back later after we've sorted out all problems.

So I'm going to drop commit dceff5c from the linux-next branch and I'm going to
revert commit 7c30ed5 along with commit 266c13d that tried to fix it and we'll
revisit the transition serialization issue when we really know how to fix it.

Thanks,
Rafael

2013-09-10 11:29:34

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 9 September 2013 20:41, Guennadi Liakhovetski <[email protected]> wrote:
> Sorry guys, I'm trying my best to stop this patch from propagating to
> stable and to get it fixed asap, so, the CC list might be a bit excessive.
> Also trying to fix the originally spare cc list, which makes it impossible
> for me to reply to the original thread, instead have to start a new one.
>
> Commit
>
> commit dceff5ce18801dddc220d6238628619c93bc3cb6
> Author: Viresh Kumar <[email protected]>
> Date: Sun Sep 1 22:19:37 2013 +0530
>
> cpufreq: fix serialization issues with freq change notifiers
>
> breaks .transition_ongoing counting. This leads to cpufreq-cpu0 not
> working any more. In particular switching the governor from performance to
> powersave directly after boot doesn't result in a frequency switch any
> more. Reverting this patch fixes the problem again. Tested with today's
> -next.

I have tested it again on my exynos and intel machines and couldn't see
a single problem with this patch..

I am afraid you need to give us some more information on how it broke
your stuff.. :)

And I am also not sure cpufreq-cpu0 is different then any other driver..

--
viresh

2013-09-10 11:38:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tuesday, September 10, 2013 04:59:29 PM Viresh Kumar wrote:
> On 9 September 2013 20:41, Guennadi Liakhovetski <[email protected]> wrote:
> > Sorry guys, I'm trying my best to stop this patch from propagating to
> > stable and to get it fixed asap, so, the CC list might be a bit excessive.
> > Also trying to fix the originally spare cc list, which makes it impossible
> > for me to reply to the original thread, instead have to start a new one.
> >
> > Commit
> >
> > commit dceff5ce18801dddc220d6238628619c93bc3cb6
> > Author: Viresh Kumar <[email protected]>
> > Date: Sun Sep 1 22:19:37 2013 +0530
> >
> > cpufreq: fix serialization issues with freq change notifiers
> >
> > breaks .transition_ongoing counting. This leads to cpufreq-cpu0 not
> > working any more. In particular switching the governor from performance to
> > powersave directly after boot doesn't result in a frequency switch any
> > more. Reverting this patch fixes the problem again. Tested with today's
> > -next.
>
> I have tested it again on my exynos and intel machines and couldn't see
> a single problem with this patch..
>
> I am afraid you need to give us some more information on how it broke
> your stuff.. :)
>
> And I am also not sure cpufreq-cpu0 is different then any other driver..

That said I'm actually unsure what problems *exactly* are fixed by commit
7c30ed5. The commit log only says that PRECHANGE or POSTCHAGE shouldn't be
called twice in a row, but it doesn't say why. As the fallout indicates,
that actually happened before commit 7c30ed5 and nothing visibly broke, so
the benefit from having that commit is questionable to me. On the other hand,
the commit itself is evidently broken, so what exactly is the reason for
keeping it?

Rafael

2013-09-10 15:12:24

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tue, 10 Sep 2013, Viresh Kumar wrote:

> On 9 September 2013 20:41, Guennadi Liakhovetski <[email protected]> wrote:
> > Sorry guys, I'm trying my best to stop this patch from propagating to
> > stable and to get it fixed asap, so, the CC list might be a bit excessive.
> > Also trying to fix the originally spare cc list, which makes it impossible
> > for me to reply to the original thread, instead have to start a new one.
> >
> > Commit
> >
> > commit dceff5ce18801dddc220d6238628619c93bc3cb6
> > Author: Viresh Kumar <[email protected]>
> > Date: Sun Sep 1 22:19:37 2013 +0530
> >
> > cpufreq: fix serialization issues with freq change notifiers
> >
> > breaks .transition_ongoing counting. This leads to cpufreq-cpu0 not
> > working any more. In particular switching the governor from performance to
> > powersave directly after boot doesn't result in a frequency switch any
> > more. Reverting this patch fixes the problem again. Tested with today's
> > -next.
>
> I have tested it again on my exynos and intel machines and couldn't see
> a single problem with this patch..

Ok, here's what I've just done.

1. checkout -next tag next-20130909

commit 98926a8004b453089368fda456b8c869240e9953
Author: Stephen Rothwell <[email protected]>
Date: Mon Sep 9 16:05:53 2013 +1000

Add linux-next specific files for 20130909

2. built and booted a kernel for kzm9g (.config available on request)

3. cpufreq failed to initialise:

cpufreq_cpu0: failed to find cpu0 node
cpufreq-cpu0: probe of cpufreq-cpu0 failed with error -2

due to commit f837a9b5ab05c52a07108c6f09ca66f2e0aee757 - as reported in
another thread.

4. reverted that commit, resolving a trivial conflict. Added a debug
output in __cpufreq_driver_target() of


if (cpufreq_disabled())
return -ENODEV;
+ pr_info("%s() %d\n", __func__, policy->transition_ongoing);
if (policy->transition_ongoing)
return -EBUSY;


Built and booted, got

cpufreq: __cpufreq_driver_target(): 1

printed out 4 times from the beginning.

5. tried

echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

the above output appeared 2 more times - no frequency change resulted.

6. reverted commit dceff5ce18801dddc220d6238628619c93bc3cb6, built booted
- cpufreq works again.

> I am afraid you need to give us some more information on how it broke
> your stuff.. :)

Hope the above is enough.

Thanks
Guennadi

> And I am also not sure cpufreq-cpu0 is different then any other driver..
>
> --
> viresh

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

2013-09-10 15:14:21

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 10 September 2013 17:19, Rafael J. Wysocki <[email protected]> wrote:
> That said I'm actually unsure what problems *exactly* are fixed by commit
> 7c30ed5. The commit log only says that PRECHANGE or POSTCHAGE shouldn't be
> called twice in a row, but it doesn't say why. As the fallout indicates,
> that actually happened before commit 7c30ed5 and nothing visibly broke, so
> the benefit from having that commit is questionable to me. On the other hand,
> the commit itself is evidently broken, so what exactly is the reason for
> keeping it?

Okay, so the first question is can we have multiple PRECHANGE notfication
done without any POSTCHANGE inbetween?
- Scenario: One reading value of cpuinfo_cur_freq, which will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()..

And ondemand governor trying to change freq of cpu and so doing notification..

There can be more similar scenarios possible..


Now Second question, is this fine to have multiple PRECHANGE notfications
before any POSTCHANGE notification?

Logically it looks obvious to me that these must be serialized..
Otherwise this is what we are broadcasting:
- We are changing for freq X, please prepare and let us know if you are
okay with it..
- Oh.. So sorry, we are changing to freq Y instead, please adjust accordingly.

- Yes we have changed to freq Y...
- Yes we have changed to freq X...

This looks stupid, isn't it? I don't know how the drivers would behave when
they see such notifications and so got to this patch initially..

2013-09-10 15:26:34

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 10 September 2013 20:42, Guennadi Liakhovetski <[email protected]> wrote:
> 4. reverted that commit, resolving a trivial conflict. Added a debug
> output in __cpufreq_driver_target() of
>
>
> if (cpufreq_disabled())
> return -ENODEV;
> + pr_info("%s() %d\n", __func__, policy->transition_ongoing);
> if (policy->transition_ongoing)
> return -EBUSY;

Are you sure this diff is on linux-next and not after the revert that
you mentioned later in the mail? There is some locking introduced
by my patch which I can't see in you diff..

> Built and booted, got
>
> cpufreq: __cpufreq_driver_target(): 1
>
> printed out 4 times from the beginning.
>
> 5. tried
>
> echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>
> the above output appeared 2 more times - no frequency change resulted.
>
> 6. reverted commit dceff5ce18801dddc220d6238628619c93bc3cb6, built booted
> - cpufreq works again.
>
>> I am afraid you need to give us some more information on how it broke
>> your stuff.. :)
>
> Hope the above is enough.

A bit more would be helpful.. Can you add these debug prints to all the places
where transition_ongoing is getting modified? with %s, __func__ to distinguish
them better? That will make it a bit easy for me...

Also, what's the configuration of your system? How many CPUs? And are all
of them sharing clock? (I believe yes, as you are using cpufreq-cpu0)..

2013-09-10 16:23:33

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tue, 10 Sep 2013, Viresh Kumar wrote:

> On 10 September 2013 20:42, Guennadi Liakhovetski <[email protected]> wrote:
> > 4. reverted that commit, resolving a trivial conflict. Added a debug
> > output in __cpufreq_driver_target() of
> >
> >
> > if (cpufreq_disabled())
> > return -ENODEV;
> > + pr_info("%s() %d\n", __func__, policy->transition_ongoing);
> > if (policy->transition_ongoing)
> > return -EBUSY;
>
> Are you sure this diff is on linux-next and not after the revert that
> you mentioned later in the mail? There is some locking introduced
> by my patch which I can't see in you diff..

Of course, isn't that what I've written above? reverted a commit and added
debug - in that order.

> > Built and booted, got
> >
> > cpufreq: __cpufreq_driver_target(): 1
> >
> > printed out 4 times from the beginning.
> >
> > 5. tried
> >
> > echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> >
> > the above output appeared 2 more times - no frequency change resulted.
> >
> > 6. reverted commit dceff5ce18801dddc220d6238628619c93bc3cb6, built booted
> > - cpufreq works again.
> >
> >> I am afraid you need to give us some more information on how it broke
> >> your stuff.. :)
> >
> > Hope the above is enough.
>
> A bit more would be helpful.. Can you add these debug prints to all the places
> where transition_ongoing is getting modified? with %s, __func__ to distinguish
> them better? That will make it a bit easy for me...

Sure, I can... So, with the performance governor I get

[ 1.290000] cpufreq-cpu0 cpufreq-cpu0: Looking up cpu0-supply from device tree
[ 1.290000] cpufreq: trying to register driver generic_cpu0
[ 1.290000] cpufreq: adding CPU 0
[ 1.290000] cpufreq: Adding link for CPU: 1
[ 1.290000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
[ 1.290000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
[ 1.290000] cpufreq: governor switch
[ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 4
[ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 1
[ 1.290000] cpufreq_performance: setting to 1196000 kHz because of event 1
[ 1.290000] cpufreq: __cpufreq_driver_target().1665 1

This is my debug - .transition_ongoing is incremented ^^^^^^^^

[ 1.300000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
[ 1.300000] cpufreq: governor: change or update limits
[ 1.300000] cpufreq: __cpufreq_governor for CPU 0, event 3
[ 1.300000] cpufreq_performance: setting to 1196000 kHz because of event 3
[ 1.300000] cpufreq: initialization complete
[ 1.300000] cpufreq: adding CPU 1
[ 1.300000] cpufreq: driver generic_cpu0 up and running

That's it. It never gets decremented again.

> Also, what's the configuration of your system? How many CPUs?

2 CPU cores.

> And are all of them sharing clock? (I believe yes, as you are using cpufreq-cpu0)..

Correct. Debug diff is below.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index ecc55d1..374e030 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -15,6 +15,8 @@
* published by the Free Software Foundation.
*/

+#define DEBUG
+
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/cpu.h>
@@ -292,6 +294,7 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy,

policy->transition_ongoing++;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ pr_info("%s().%d %d\n", __func__, __LINE__, policy->transition_ongoing);

/* detect if the driver reported a value as "old frequency"
* which is not equal to what the cpufreq core thinks is
@@ -321,6 +324,7 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy,

policy->transition_ongoing--;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ pr_info("%s().%d %d\n", __func__, __LINE__, policy->transition_ongoing);

adjust_jiffies(CPUFREQ_POSTCHANGE, freqs);
pr_debug("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
@@ -359,6 +363,7 @@ void cpufreq_notify_transition(struct cpufreq_policy *policy,
write_lock_irqsave(&cpufreq_driver_lock, flags);
policy->transition_ongoing--;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ pr_info("%s().%d %d\n", __func__, __LINE__, policy->transition_ongoing);
}
}
EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
@@ -1356,6 +1361,7 @@ static void cpufreq_out_of_sync(unsigned int cpu, unsigned int old_freq,
}
policy->transition_ongoing++;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ pr_info("%s().%d %d\n", __func__, __LINE__, policy->transition_ongoing);

cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
@@ -1656,6 +1662,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
}
policy->transition_ongoing++;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ pr_info("%s().%d %d\n", __func__, __LINE__, policy->transition_ongoing);

/* Make sure that target_freq is within supported range */
if (target_freq > policy->max)
diff --git a/drivers/cpufreq/cpufreq_performance.c b/drivers/cpufreq/cpufreq_performance.c
index cf117de..5575b08 100644
--- a/drivers/cpufreq/cpufreq_performance.c
+++ b/drivers/cpufreq/cpufreq_performance.c
@@ -10,6 +10,8 @@
*
*/

+#define DEBUG
+
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/cpufreq.h>

2013-09-10 16:34:46

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 10 September 2013 21:52, Guennadi Liakhovetski <[email protected]> wrote:
> Of course, isn't that what I've written above? reverted a commit and added
> debug - in that order.

Ok, I misread it then :(

> Sure, I can... So, with the performance governor I get
>
> [ 1.290000] cpufreq-cpu0 cpufreq-cpu0: Looking up cpu0-supply from device tree
> [ 1.290000] cpufreq: trying to register driver generic_cpu0
> [ 1.290000] cpufreq: adding CPU 0
> [ 1.290000] cpufreq: Adding link for CPU: 1
> [ 1.290000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
> [ 1.290000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
> [ 1.290000] cpufreq: governor switch
> [ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 4
> [ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 1
> [ 1.290000] cpufreq_performance: setting to 1196000 kHz because of event 1
> [ 1.290000] cpufreq: __cpufreq_driver_target().1665 1
>
> This is my debug - .transition_ongoing is incremented ^^^^^^^^
>
> [ 1.300000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz

Quite straight forward actually.. Please try attached patch and see if it fixes
your problem.. Which it should if I am not wrong.. I will send it
separately then..

Thanks for your time..


Attachments:
0001-fix-target.patch (1.67 kB)

2013-09-10 17:08:15

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tue, 10 Sep 2013, Viresh Kumar wrote:

> On 10 September 2013 21:52, Guennadi Liakhovetski <[email protected]> wrote:
> > Of course, isn't that what I've written above? reverted a commit and added
> > debug - in that order.
>
> Ok, I misread it then :(
>
> > Sure, I can... So, with the performance governor I get
> >
> > [ 1.290000] cpufreq-cpu0 cpufreq-cpu0: Looking up cpu0-supply from device tree
> > [ 1.290000] cpufreq: trying to register driver generic_cpu0
> > [ 1.290000] cpufreq: adding CPU 0
> > [ 1.290000] cpufreq: Adding link for CPU: 1
> > [ 1.290000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
> > [ 1.290000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
> > [ 1.290000] cpufreq: governor switch
> > [ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 4
> > [ 1.290000] cpufreq: __cpufreq_governor for CPU 0, event 1
> > [ 1.290000] cpufreq_performance: setting to 1196000 kHz because of event 1
> > [ 1.290000] cpufreq: __cpufreq_driver_target().1665 1
> >
> > This is my debug - .transition_ongoing is incremented ^^^^^^^^
> >
> > [ 1.300000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
>
> Quite straight forward actually..

Apparently, not quite.

> Please try attached patch and see if it fixes
> your problem.. Which it should if I am not wrong.. I will send it
> separately then..

It helps only once. The first switching works, the second one doesn't.
Below debug

[ 12.010000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
[ 12.010000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
[ 12.010000] cpufreq: governor switch
[ 12.010000] cpufreq: __cpufreq_governor for CPU 0, event 2
[ 12.010000] cpufreq: __cpufreq_governor for CPU 0, event 5
[ 12.010000] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[ 12.020000] cpufreq: __cpufreq_governor for CPU 0, event 4
[ 12.020000] cpufreq: __cpufreq_governor for CPU 0, event 1
[ 12.020000] cpufreq_performance: setting to 1196000 kHz because of event 1
[ 12.020000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
[ 12.020000] cpufreq: governor: change or update limits
[ 12.020000] cpufreq: __cpufreq_governor for CPU 0, event 3
[ 12.020000] cpufreq_performance: setting to 1196000 kHz because of event 3
[ 12.020000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
[ 12.030000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
[ 12.030000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
[ 12.030000] cpufreq: governor switch
[ 12.030000] cpufreq: __cpufreq_governor for CPU 0, event 2
[ 12.030000] cpufreq: __cpufreq_governor for CPU 0, event 5
[ 12.030000] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[ 12.040000] cpufreq: __cpufreq_governor for CPU 0, event 4
[ 12.040000] cpufreq: __cpufreq_governor for CPU 0, event 1
[ 12.040000] cpufreq_performance: setting to 1196000 kHz because of event 1
[ 12.040000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
[ 12.040000] cpufreq: governor: change or update limits
[ 12.040000] cpufreq: __cpufreq_governor for CPU 0, event 3
[ 12.040000] cpufreq_performance: setting to 1196000 kHz because of event 3
[ 12.040000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz

echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

[ 66.490000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
[ 66.490000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
[ 66.490000] cpufreq: governor switch
[ 66.490000] cpufreq: __cpufreq_governor for CPU 0, event 2
[ 66.490000] cpufreq: __cpufreq_governor for CPU 0, event 5
[ 66.490000] cpufreq: __cpufreq_governor for CPU 0, event 4
[ 66.490000] cpufreq: __cpufreq_governor for CPU 0, event 1
[ 66.490000] cpufreq: target for CPU 0: 398667 kHz, relation 0, requested 398667 kHz
[ 66.490000] cpufreq: __cpufreq_driver_target().1677 1
[ 66.500000] cpufreq: notification 0 of frequency transition to 398666 kHz
[ 66.500000] cpufreq: __cpufreq_notify_transition().297 2
[ 66.500000] cpufreq: notification 0 of frequency transition to 398666 kHz
[ 66.500000] cpufreq: __cpufreq_notify_transition().297 3
[ 66.510000] cpufreq: notification 1 of frequency transition to 398666 kHz
[ 66.510000] cpufreq: __cpufreq_notify_transition().327 2
[ 66.520000] cpufreq: FREQ: 398666 - CPU: 0
[ 66.520000] cpufreq: notification 1 of frequency transition to 398666 kHz
[ 66.520000] cpufreq: __cpufreq_notify_transition().327 1
[ 66.520000] cpufreq: FREQ: 398666 - CPU: 1
[ 66.520000] cpufreq: cpufreq_notify_transition().366 0
[ 66.530000] cpufreq: governor: change or update limits
[ 66.530000] cpufreq: __cpufreq_governor for CPU 0, event 3
[ 66.530000] cpufreq: target for CPU 0: 398667 kHz, relation 0, requested 398667 kHz
[ 66.530000] cpufreq: __cpufreq_driver_target().1677 1

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

[ 72.470000] cpufreq: setting new policy for CPU 0: 398667 - 1196000 kHz
[ 72.470000] cpufreq: new min and max freqs are 398667 - 1196000 kHz
[ 72.470000] cpufreq: governor switch
[ 72.470000] cpufreq: __cpufreq_governor for CPU 0, event 2
[ 72.470000] cpufreq: __cpufreq_governor for CPU 0, event 5
[ 72.470000] cpufreq: __cpufreq_governor for CPU 0, event 4
[ 72.470000] cpufreq: __cpufreq_governor for CPU 0, event 1
[ 72.470000] cpufreq_performance: setting to 1196000 kHz because of event 1
[ 72.470000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz
[ 72.470000] cpufreq: governor: change or update limits
[ 72.470000] cpufreq: __cpufreq_governor for CPU 0, event 3
[ 72.470000] cpufreq_performance: setting to 1196000 kHz because of event 3
[ 72.470000] cpufreq: target for CPU 0: 1196000 kHz, relation 1, requested 1196000 kHz

> Thanks for your time..

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

2013-09-10 19:35:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Tuesday, September 10, 2013 08:44:18 PM Viresh Kumar wrote:
> On 10 September 2013 17:19, Rafael J. Wysocki <[email protected]> wrote:
> > That said I'm actually unsure what problems *exactly* are fixed by commit
> > 7c30ed5. The commit log only says that PRECHANGE or POSTCHAGE shouldn't be
> > called twice in a row, but it doesn't say why. As the fallout indicates,
> > that actually happened before commit 7c30ed5 and nothing visibly broke, so
> > the benefit from having that commit is questionable to me. On the other hand,
> > the commit itself is evidently broken, so what exactly is the reason for
> > keeping it?
>
> Okay, so the first question is can we have multiple PRECHANGE notfication
> done without any POSTCHANGE inbetween?
> - Scenario: One reading value of cpuinfo_cur_freq, which will call
> __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()..
>
> And ondemand governor trying to change freq of cpu and so doing notification..
>
> There can be more similar scenarios possible..

OK, and what exactly does break here? What's the exact race you're concerned
about?

What field is corrupted in which object, for example, or what incorrect
behavior of the hardware results from this?

> Now Second question, is this fine to have multiple PRECHANGE notfications
> before any POSTCHANGE notification?
>
> Logically it looks obvious to me that these must be serialized..
> Otherwise this is what we are broadcasting:
> - We are changing for freq X, please prepare and let us know if you are
> okay with it..
> - Oh.. So sorry, we are changing to freq Y instead, please adjust accordingly.
>
> - Yes we have changed to freq Y...
> - Yes we have changed to freq X...
>
> This looks stupid, isn't it? I don't know how the drivers would behave when
> they see such notifications and so got to this patch initially..

Well, precisely, you don't know.

Can you please look for specific problems and address those instead of trying
to address potential issues that may or may not happen and may or may not really
be issues even if they actually happen?

And broken patches that try to fix such things definitely don't help.

For now I'm reverting the commit in question and everything on top of it.

Thanks,
Rafael

2013-09-11 08:07:11

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 10 September 2013 22:37, Guennadi Liakhovetski <[email protected]> wrote:
> On Tue, 10 Sep 2013, Viresh Kumar wrote:
>> Quite straight forward actually..
>
> Apparently, not quite.

I overlooked the situation where we return early from ->target() routines.. :(

Please try attached patches, I will repost them later (once I am able to
convince Rafael that these are really important :) )

--
viresh


Attachments:
0001-cpufreq-distinguish-drivers-that-do-asynchronous-not.patch (2.26 kB)
0002-cpufreq-fix-notification-serialization-issues.patch (4.29 kB)
Download all attachments

2013-09-11 08:16:26

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Wed, 11 Sep 2013, Viresh Kumar wrote:

> On 10 September 2013 22:37, Guennadi Liakhovetski <[email protected]> wrote:
> > On Tue, 10 Sep 2013, Viresh Kumar wrote:
> >> Quite straight forward actually..
> >
> > Apparently, not quite.
>
> I overlooked the situation where we return early from ->target() routines.. :(
>
> Please try attached patches, I will repost them later (once I am able to
> convince Rafael that these are really important :) )

I'd rather wait until Rafael is convinced, then we'll see.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

2013-09-11 08:38:49

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 11 September 2013 01:16, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, September 10, 2013 08:44:18 PM Viresh Kumar wrote:

>> Now Second question, is this fine to have multiple PRECHANGE notfications
>> before any POSTCHANGE notification?
>>
>> Logically it looks obvious to me that these must be serialized..
>> Otherwise this is what we are broadcasting:
>> - We are changing for freq X, please prepare and let us know if you are
>> okay with it..
>> - Oh.. So sorry, we are changing to freq Y instead, please adjust accordingly.
>>
>> - Yes we have changed to freq Y...
>> - Yes we have changed to freq X...
>>
>> This looks stupid, isn't it? I don't know how the drivers would behave when
>> they see such notifications and so got to this patch initially..
>
> Well, precisely, you don't know.
>
> Can you please look for specific problems and address those instead of trying
> to address potential issues that may or may not happen and may or may not really
> be issues even if they actually happen?

That looked like a straight forward issue/bug to me and so I haven't
gotten deep into it..

Scenario 1:
--------------
Notifiers are not serialized and suppose this is what happened
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A

Now the last POSTCHANGE Notification happened for freq A and
hardware is currently running at freq B :)

Where would we break then?: adjust_jiffies() in cpufreq.c,
cpufreq_callback() in arch/arm/kernel/smp.c (which is also adjusting
jiffies)..

Now, all loops_per_jiffy based stuff is broken.. Have I missed
something?

Similarly there are numerous places where we may break.. As we
have broken expectations of receivers of these notifications..

Scenario 2:
--------------
Governor is changing freq and has called __cpufreq_driver_target().
At the same time we are changing scaling_{min|max}_freq from
sysfs, which would eventually end up calling governors:
CPUFREQ_GOV_LIMITS notification, that will also call:
__cpufreq_driver_target()..

So, we eventually have two concurrent calls to ->target() and we
don't really know how hardware will behave in this case.. Most of
the implementations of ->target() routines just go and change
freq/voltage without checking if we are already in progress of doing
that (i.e. based on expectation that this call is not re entrant)..

Now anything can happen at hardware level, which I don't have
all insight of :(

> And broken patches that try to fix such things definitely don't help.

Yeah, that's my fault but I honestly try to produce bug-less patches..
But being a normal human being, there are always some chances
that my patches are eventually broken :)

> For now I'm reverting the commit in question and everything on top of it.

Its okay..

2013-09-11 08:39:40

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 11 September 2013 13:45, Guennadi Liakhovetski <[email protected]> wrote:
> I'd rather wait until Rafael is convinced, then we'll see.

Okay.. I have just sent a mail to Rafael about that, see if you
are convinced with what I wrote :)

--
viresh

2013-09-11 13:07:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Wednesday, September 11, 2013 02:08:44 PM Viresh Kumar wrote:
> On 11 September 2013 01:16, Rafael J. Wysocki <[email protected]> wrote:
> > On Tuesday, September 10, 2013 08:44:18 PM Viresh Kumar wrote:
>
> >> Now Second question, is this fine to have multiple PRECHANGE notfications
> >> before any POSTCHANGE notification?
> >>
> >> Logically it looks obvious to me that these must be serialized..
> >> Otherwise this is what we are broadcasting:
> >> - We are changing for freq X, please prepare and let us know if you are
> >> okay with it..
> >> - Oh.. So sorry, we are changing to freq Y instead, please adjust accordingly.
> >>
> >> - Yes we have changed to freq Y...
> >> - Yes we have changed to freq X...
> >>
> >> This looks stupid, isn't it? I don't know how the drivers would behave when
> >> they see such notifications and so got to this patch initially..
> >
> > Well, precisely, you don't know.
> >
> > Can you please look for specific problems and address those instead of trying
> > to address potential issues that may or may not happen and may or may not really
> > be issues even if they actually happen?
>
> That looked like a straight forward issue/bug to me and so I haven't
> gotten deep into it..

Which you should always do when you're going to deal with concurrency issues.
Even if they appear to be obvious, they often are far from that, like in this
case.

> Scenario 1:
> --------------
> Notifiers are not serialized and suppose this is what happened
> - PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
> - PRECHANGE Notification for freq B (from target())
> - Freq changed by target() to B
> - POSTCHANGE Notification for freq B
> - POSTCHANGE Notification for freq A
>
> Now the last POSTCHANGE Notification happened for freq A and
> hardware is currently running at freq B :)
>
> Where would we break then?: adjust_jiffies() in cpufreq.c,
> cpufreq_callback() in arch/arm/kernel/smp.c (which is also adjusting
> jiffies)..
>
> Now, all loops_per_jiffy based stuff is broken.. Have I missed
> something?
>
> Similarly there are numerous places where we may break.. As we
> have broken expectations of receivers of these notifications..

Now it is clear what the problem is and that should go in the changelog of the
fix patch.

> Scenario 2:
> --------------
> Governor is changing freq and has called __cpufreq_driver_target().
> At the same time we are changing scaling_{min|max}_freq from
> sysfs, which would eventually end up calling governors:
> CPUFREQ_GOV_LIMITS notification, that will also call:
> __cpufreq_driver_target()..
>
> So, we eventually have two concurrent calls to ->target() and we
> don't really know how hardware will behave in this case.. Most of
> the implementations of ->target() routines just go and change
> freq/voltage without checking if we are already in progress of doing
> that (i.e. based on expectation that this call is not re entrant)..
>
> Now anything can happen at hardware level, which I don't have
> all insight of :(

That is more theoretical, however.

I think that the "Scenario 1" is a good enough reason for making this change,
but it needs to be done with care.

> > And broken patches that try to fix such things definitely don't help.
>
> Yeah, that's my fault but I honestly try to produce bug-less patches..
> But being a normal human being, there are always some chances
> that my patches are eventually broken :)

Well, that happens for everyone, but in this particular case we had three
different issues because of a single patch ...

And actually I also should have spend more time on this before applying the
patch, especially given the laconic changelog, so it's my fault too.

> > For now I'm reverting the commit in question and everything on top of it.
>
> Its okay..

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-09-11 13:17:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Wednesday, September 11, 2013 10:15:53 AM Guennadi Liakhovetski wrote:
> On Wed, 11 Sep 2013, Viresh Kumar wrote:
>
> > On 10 September 2013 22:37, Guennadi Liakhovetski <[email protected]> wrote:
> > > On Tue, 10 Sep 2013, Viresh Kumar wrote:
> > >> Quite straight forward actually..
> > >
> > > Apparently, not quite.
> >
> > I overlooked the situation where we return early from ->target() routines.. :(
> >
> > Please try attached patches, I will repost them later (once I am able to
> > convince Rafael that these are really important :) )
>
> I'd rather wait until Rafael is convinced, then we'll see.

I'm going to revert the commit that was the source of these issues, but that
doesn't mean that everything is now rosy. We need to fix the concurrency
problems that Viresh was trying to fix in that commit, so it would be nice if
you could verify whether or not he is on the right track.

Thanks,
Rafael

2013-09-12 00:39:42

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 11 September 2013 18:48, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, September 11, 2013 02:08:44 PM Viresh Kumar wrote:

>> That looked like a straight forward issue/bug to me and so I haven't
>> gotten deep into it..
>
> Which you should always do when you're going to deal with concurrency issues.
> Even if they appear to be obvious, they often are far from that, like in this
> case.

/me Nods

>> Scenario 2:
>> --------------
>> Governor is changing freq and has called __cpufreq_driver_target().
>> At the same time we are changing scaling_{min|max}_freq from
>> sysfs, which would eventually end up calling governors:
>> CPUFREQ_GOV_LIMITS notification, that will also call:
>> __cpufreq_driver_target()..
>>
>> So, we eventually have two concurrent calls to ->target() and we
>> don't really know how hardware will behave in this case.. Most of
>> the implementations of ->target() routines just go and change
>> freq/voltage without checking if we are already in progress of doing
>> that (i.e. based on expectation that this call is not re entrant)..
>>
>> Now anything can happen at hardware level, which I don't have
>> all insight of :(
>
> That is more theoretical, however.

Maybe we can get more deeper into it then :)
Platform have something like this in their target()

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

Now, two concurrent calls to target are X and Y, where X is trying to increase
freq and Y is trying to decrease it..

And this is the sequence that followed due to races..

X.A: voltage increased for larger freq
Y.A: nothing happened here
Y.B: freq decreased
Y.C: voltage decreased
X.B: freq increased
X.C: nothing happened..

We ended up setting a freq which is not supported by the voltage we have
set.. That will probably make clock to CPU unstable and system wouldn't
be workable anymore...

And so I think even this case must also get some space in the changelog :)

2013-09-12 00:43:54

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 9/12/2013 2:39 AM, Viresh Kumar wrote:
> On 11 September 2013 18:48, Rafael J. Wysocki <[email protected]> wrote:
>> On Wednesday, September 11, 2013 02:08:44 PM Viresh Kumar wrote:
>>> That looked like a straight forward issue/bug to me and so I haven't
>>> gotten deep into it..
>> Which you should always do when you're going to deal with concurrency issues.
>> Even if they appear to be obvious, they often are far from that, like in this
>> case.
> /me Nods
>
>>> Scenario 2:
>>> --------------
>>> Governor is changing freq and has called __cpufreq_driver_target().
>>> At the same time we are changing scaling_{min|max}_freq from
>>> sysfs, which would eventually end up calling governors:
>>> CPUFREQ_GOV_LIMITS notification, that will also call:
>>> __cpufreq_driver_target()..
>>>
>>> So, we eventually have two concurrent calls to ->target() and we
>>> don't really know how hardware will behave in this case.. Most of
>>> the implementations of ->target() routines just go and change
>>> freq/voltage without checking if we are already in progress of doing
>>> that (i.e. based on expectation that this call is not re entrant)..
>>>
>>> Now anything can happen at hardware level, which I don't have
>>> all insight of :(
>> That is more theoretical, however.
> Maybe we can get more deeper into it then :)
> Platform have something like this in their target()

Which platform?

> A. If new freq is more than old: Increase voltage
> B. Change freq
> C. If new freq is less than old: decrease voltage
>
> Now, two concurrent calls to target are X and Y, where X is trying to increase
> freq and Y is trying to decrease it..
>
> And this is the sequence that followed due to races..
>
> X.A: voltage increased for larger freq
> Y.A: nothing happened here
> Y.B: freq decreased
> Y.C: voltage decreased
> X.B: freq increased
> X.C: nothing happened..
>
> We ended up setting a freq which is not supported by the voltage we have
> set.. That will probably make clock to CPU unstable and system wouldn't
> be workable anymore...
>
> And so I think even this case must also get some space in the changelog :)

Yes, if you can point to a specific driver having this problem.

Thanks,
Rafael

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.

2013-09-12 05:36:17

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 12 September 2013 06:13, Rafael J. Wysocki
<[email protected]> wrote:
> Yes, if you can point to a specific driver having this problem.

There are so many of those (I know it because I went through almost all drivers
recently with my cleanup series): cpufreq-cpu0, omap-cpufreq,
exynos-cpufreq, etc..

They all do this:

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

2013-09-12 07:48:24

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

Hi Viresh

On Wed, 11 Sep 2013, Viresh Kumar wrote:

> On 10 September 2013 22:37, Guennadi Liakhovetski <[email protected]> wrote:
> > On Tue, 10 Sep 2013, Viresh Kumar wrote:
> >> Quite straight forward actually..
> >
> > Apparently, not quite.
>
> I overlooked the situation where we return early from ->target() routines.. :(
>
> Please try attached patches, I will repost them later (once I am able to
> convince Rafael that these are really important :) )

Yes, they seem to fix my issues. You probably aren't going to submit them
in this form, instead, merge them with the original serialisation fix
patch, right? So, you don't need my tested-by here. But feel free to cc me
when you submit a fixed version.

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

2013-09-12 07:51:44

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 12 September 2013 13:17, Guennadi Liakhovetski <[email protected]> wrote:
> Yes, they seem to fix my issues.

Great!!

> You probably aren't going to submit them
> in this form, instead, merge them with the original serialisation fix
> patch, right? So, you don't need my tested-by here. But feel free to cc me
> when you submit a fixed version.

I will add your tested-by as you have eventually test all three patch that I
would merge :)

Thanks for testing this..

2013-09-12 10:50:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On Thursday, September 12, 2013 11:06:14 AM Viresh Kumar wrote:
> On 12 September 2013 06:13, Rafael J. Wysocki
> <[email protected]> wrote:
> > Yes, if you can point to a specific driver having this problem.
>
> There are so many of those (I know it because I went through almost all drivers
> recently with my cleanup series): cpufreq-cpu0, omap-cpufreq,
> exynos-cpufreq, etc..
>
> They all do this:
>
> A. If new freq is more than old: Increase voltage
> B. Change freq
> C. If new freq is less than old: decrease voltage

Well, if there's more than one it's even simpler. You can just pick one. :-)

Like, "for example, <driver name> does the following ... which may cause the
hardware to misbehave, because ...".

Thanks,
Rafael

2013-09-12 10:52:34

by Viresh Kumar

[permalink] [raw]
Subject: Re: "cpufreq: fix serialization issues with freq change notifiers" breaks cpufreq too

On 12 September 2013 16:31, Rafael J. Wysocki <[email protected]> wrote:
> Well, if there's more than one it's even simpler. You can just pick one. :-)
>
> Like, "for example, <driver name> does the following ... which may cause the
> hardware to misbehave, because ...".

:)

I have written one of my longest changelog ever on the resent of this patch..
Hope that doesn't have any unanswered stuff left :)