2014-04-28 19:25:00

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 0/5] Cpufreq frequency serialization fixes


Hi,

Meelis Roos reported hangs during boot in the longhaul cpufreq driver, after
commit 12478cf0c55 (cpufreq: Make sure frequency transitions are serialized).
The root-cause of this issue is the extra invocation of the
cpufreq_freq_transition_begin() and cpufreq_freq_transition_end() APIs in the
longhaul driver. I found similar issues in the powernow-k6 and powernow-k7
drivers as well. This patchset fixes the issue in all the 3 drivers and also
adds a debug infrastructure to catch such issues easily.

Patches 1-4 fix the regression in longhaul, powernow-k6 and powernow-k7
drivers. (Patch 2 fixes a different bug in powernow-k6, and it is kept as a
separate patch instead of merging it with patch 3, because I felt that it was
a bit subtle and needed attention in a separate patch).

Patch 5 adds a debug infrastructure to the cpufreq core to catch such problems
more easily in the future.


Changes in v2:
--------------

* Modified patch 1 to take error returns into account, as pointed out by
Viresh.
* Added patch 2 to fix the existing issue in the powernow-k6 driver, pointed
out by Viresh.
* Added patch 5 to introduce a debug infrastructure to catch such issues
easily.


Srivatsa S. Bhat (5):
cpufreq, longhaul: Fix double invocation of cpufreq_freq_transition_begin/end
cpufreq, powernow-k6: Fix incorrect comparison with max_multipler
cpufreq, powernow-k6: Fix double invocation of cpufreq_freq_transition_begin/end
cpufreq, powernow-k7: Fix double invocation of cpufreq_freq_transition_begin/end
cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end


drivers/cpufreq/cpufreq.c | 12 ++++++++++++
drivers/cpufreq/longhaul.c | 36 ++++++++++++++++++++++++------------
drivers/cpufreq/powernow-k6.c | 23 +++++++++++++----------
drivers/cpufreq/powernow-k7.c | 4 ----
include/linux/cpufreq.h | 1 +
5 files changed, 50 insertions(+), 26 deletions(-)


Thanks,
Srivatsa S. Bhat
IBM Linux Technology Center


2014-04-28 19:25:37

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 2/5] cpufreq, powernow-k6: Fix incorrect comparison with max_multipler

The value of 'max_multiplier' is meant to be used for comparison with
clock_ratio[index].driver_data, not the index itself! Fix the code in
powernow_k6_cpu_exit() that has this bug.

Also, while at it, make the for-loop condition look for CPUFREQ_TABLE_END,
instead of hard-coding the loop count to 8.

Reported-by: Viresh Kumar <[email protected]>
Signed-off-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
---

drivers/cpufreq/powernow-k6.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernow-k6.c b/drivers/cpufreq/powernow-k6.c
index 49f120e..695a68c 100644
--- a/drivers/cpufreq/powernow-k6.c
+++ b/drivers/cpufreq/powernow-k6.c
@@ -227,8 +227,9 @@ have_busfreq:
static int powernow_k6_cpu_exit(struct cpufreq_policy *policy)
{
unsigned int i;
- for (i = 0; i < 8; i++) {
- if (i == max_multiplier)
+
+ for (i = 0; (clock_ratio[i].frequency != CPUFREQ_TABLE_END); i++) {
+ if (clock_ratio[i].driver_data == max_multiplier)
powernow_k6_target(policy, i);
}
return 0;

2014-04-28 19:26:05

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 3/5] cpufreq, powernow-k6: Fix double invocation of cpufreq_freq_transition_begin/end

During frequency transitions, the cpufreq core takes the responsibility of
invoking cpufreq_freq_transition_begin() and cpufreq_freq_transition_end()
for those cpufreq drivers that define the ->target_index callback but don't
set the ASYNC_NOTIFICATION flag.

The powernow-k6 cpufreq driver falls under this category, but this driver was
invoking the _begin() and _end() APIs itself around frequency transitions,
which led to double invocation of the _begin() API. The _begin API makes
contending callers wait until the previous invocation is complete. Hence,
the powernow-k6 driver ended up waiting on itself, leading to system hangs
during boot.

Fix this by removing the calls to the _begin() and _end() APIs from the
powernow-k6 driver, since they rightly belong to the cpufreq core.

(Note that during ->exit(), the powernow-k6 driver sets the frequency
without any help from the cpufreq core. So add explicit calls to the
_begin() and _end() APIs around that frequency transition alone, to take
care of that special case. Also, add a missing 'break' statement there.)

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
---

drivers/cpufreq/powernow-k6.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernow-k6.c b/drivers/cpufreq/powernow-k6.c
index 695a68c..78904e6 100644
--- a/drivers/cpufreq/powernow-k6.c
+++ b/drivers/cpufreq/powernow-k6.c
@@ -138,22 +138,14 @@ static void powernow_k6_set_cpu_multiplier(unsigned int best_i)
static int powernow_k6_target(struct cpufreq_policy *policy,
unsigned int best_i)
{
- struct cpufreq_freqs freqs;

if (clock_ratio[best_i].driver_data > max_multiplier) {
printk(KERN_ERR PFX "invalid target frequency\n");
return -EINVAL;
}

- freqs.old = busfreq * powernow_k6_get_cpu_multiplier();
- freqs.new = busfreq * clock_ratio[best_i].driver_data;
-
- cpufreq_freq_transition_begin(policy, &freqs);
-
powernow_k6_set_cpu_multiplier(best_i);

- cpufreq_freq_transition_end(policy, &freqs, 0);
-
return 0;
}

@@ -229,8 +221,18 @@ static int powernow_k6_cpu_exit(struct cpufreq_policy *policy)
unsigned int i;

for (i = 0; (clock_ratio[i].frequency != CPUFREQ_TABLE_END); i++) {
- if (clock_ratio[i].driver_data == max_multiplier)
+ if (clock_ratio[i].driver_data == max_multiplier) {
+ struct cpufreq_freqs freqs;
+
+ freqs.old = policy->cur;
+ freqs.new = clock_ratio[i].frequency;
+ freqs.flags = 0;
+
+ cpufreq_freq_transition_begin(policy, &freqs);
powernow_k6_target(policy, i);
+ cpufreq_freq_transition_end(policy, &freqs, 0);
+ break;
+ }
}
return 0;
}

2014-04-28 19:26:21

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

One important aspect to consider is that it is valid for ASYNC_NOTIFICATION
drivers to invoke _begin() from one task and then invoke the corresponding
_end() in another task. We take care of this scenario by setting the value
of 'transition_task' once again explicitly in the _end() function. This
helps us handle the particular tricky scenario (shown below) that can occur
in ASYNC_NOTIFICATION drivers:

Scenario 1: (Deadlock-free)
----------

Task A Task B

/* 1st freq transition */
Invoke _begin() {
...
...
}

Change the frequency

Got interrupt for successful
change of frequency.

/* 1st freq transition */
Invoke _end() {
...
...
/* 2nd freq transition */ ...
Invoke _begin() { ...
... //waiting for B ...
... //to finish _end() }
...
...
}


This scenario is actually deadlock-free because Task A can wait inside the
second call to _begin() without self-deadlocking, because it is the
responsibility of Task B to finish the first sequence by invoking the
corresponding _end().

By setting the value of 'transition_task' again explicitly in _end(), we
ensure that the code won't print a false-positive warning in this case.

However the same code successfully catches the following deadlock-prone
scenario even in ASYNC_NOTIFICATION drivers:

Scenario 2: (Deadlock-prone)
----------

Task A Task B

/* 1st freq transition */
Invoke _begin() {
...
...
}

/* 2nd freq transition */
Invoke _begin() {
...
...
}

Change the frequency


Here the bug is that Task A called the second _begin() *before* actually
performing the 1st frequency transition. In other words, it failed to set
Task B in motion for the 1st frequency transition, and hence it will
self-deadlock. This is very similar to the case of drivers which do
synchronous notification, and hence the debug infrastructure developed
in this patch can catch this scenario easily.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
---

drivers/cpufreq/cpufreq.c | 12 ++++++++++++
include/linux/cpufreq.h | 1 +
2 files changed, 13 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index abda660..2c99a6c 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -354,6 +354,10 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs)
{
+
+ /* Catch double invocations of _begin() which lead to self-deadlock */
+ WARN_ON(current == policy->transition_task);
+
wait:
wait_event(policy->transition_wait, !policy->transition_ongoing);

@@ -365,6 +369,7 @@ wait:
}

policy->transition_ongoing = true;
+ policy->transition_task = current;

spin_unlock(&policy->transition_lock);

@@ -378,9 +383,16 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
if (unlikely(WARN_ON(!policy->transition_ongoing)))
return;

+ /*
+ * The task invoking _end() could be different from the one that
+ * invoked the _begin(). So set ->transition_task again here
+ * explicity.
+ */
+ policy->transition_task = current;
cpufreq_notify_post_transition(policy, freqs, transition_failed);

policy->transition_ongoing = false;
+ policy->transition_task = NULL;

wake_up(&policy->transition_wait);
}
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5ae5100..8f44d79 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -110,6 +110,7 @@ struct cpufreq_policy {
bool transition_ongoing; /* Tracks transition status */
spinlock_t transition_lock;
wait_queue_head_t transition_wait;
+ struct task_struct *transition_task; /* Task which is doing the transition */
};

/* Only for ACPI */

2014-04-28 19:26:03

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 4/5] cpufreq, powernow-k7: Fix double invocation of cpufreq_freq_transition_begin/end

During frequency transitions, the cpufreq core takes the responsibility of
invoking cpufreq_freq_transition_begin() and cpufreq_freq_transition_end()
for those cpufreq drivers that define the ->target_index callback but don't
set the ASYNC_NOTIFICATION flag.

The powernow-k7 cpufreq driver falls under this category, but this driver was
invoking the _begin() and _end() APIs itself around frequency transitions,
which led to double invocation of the _begin() API. The _begin API makes
contending callers wait until the previous invocation is complete. Hence,
the powernow-k7 driver ended up waiting on itself, leading to system hangs
during boot.

Fix this by removing the calls to the _begin() and _end() APIs from the
powernow-k7 driver, since they rightly belong to the cpufreq core.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
---

drivers/cpufreq/powernow-k7.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/drivers/cpufreq/powernow-k7.c b/drivers/cpufreq/powernow-k7.c
index f911645..e61e224 100644
--- a/drivers/cpufreq/powernow-k7.c
+++ b/drivers/cpufreq/powernow-k7.c
@@ -269,8 +269,6 @@ static int powernow_target(struct cpufreq_policy *policy, unsigned int index)

freqs.new = powernow_table[index].frequency;

- cpufreq_freq_transition_begin(policy, &freqs);
-
/* Now do the magic poking into the MSRs. */

if (have_a0 == 1) /* A0 errata 5 */
@@ -290,8 +288,6 @@ static int powernow_target(struct cpufreq_policy *policy, unsigned int index)
if (have_a0 == 1)
local_irq_enable();

- cpufreq_freq_transition_end(policy, &freqs, 0);
-
return 0;
}

2014-04-28 19:25:24

by Srivatsa S. Bhat

[permalink] [raw]
Subject: [PATCH v2 1/5] cpufreq, longhaul: Fix double invocation of cpufreq_freq_transition_begin/end

During frequency transitions, the cpufreq core takes the responsibility of
invoking cpufreq_freq_transition_begin() and cpufreq_freq_transition_end()
for those cpufreq drivers that define the ->target_index callback but don't
set the ASYNC_NOTIFICATION flag.

The longhaul cpufreq driver falls under this category, but this driver was
invoking the _begin() and _end() APIs itself around frequency transitions,
which led to double invocation of the _begin() API. The _begin API makes
contending callers wait until the previous invocation is complete. Hence,
the longhaul driver ended up waiting on itself, leading to system hangs
during boot.

Fix this by removing the calls to the _begin() and _end() APIs from the
longhaul driver, since they rightly belong to the cpufreq core.

(Note that during module_exit(), the longhaul driver sets the frequency
without any help from the cpufreq core. So add explicit calls to the
_begin() and _end() APIs around that frequency transition alone, to take
care of that special case.)

Reported-and-Tested-by: Meelis Roos <[email protected]>
Signed-off-by: Srivatsa S. Bhat <[email protected]>
---

drivers/cpufreq/longhaul.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index d00e5d1..5c4369b 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -242,7 +242,7 @@ static void do_powersaver(int cx_address, unsigned int mults_index,
* Sets a new clock ratio.
*/

-static void longhaul_setstate(struct cpufreq_policy *policy,
+static int longhaul_setstate(struct cpufreq_policy *policy,
unsigned int table_index)
{
unsigned int mults_index;
@@ -258,10 +258,12 @@ static void longhaul_setstate(struct cpufreq_policy *policy,
/* Safety precautions */
mult = mults[mults_index & 0x1f];
if (mult == -1)
- return;
+ return -EINVAL;
+
speed = calc_speed(mult);
if ((speed > highest_speed) || (speed < lowest_speed))
- return;
+ return -EINVAL;
+
/* Voltage transition before frequency transition? */
if (can_scale_voltage && longhaul_index < table_index)
dir = 1;
@@ -269,8 +271,6 @@ static void longhaul_setstate(struct cpufreq_policy *policy,
freqs.old = calc_speed(longhaul_get_cpu_mult());
freqs.new = speed;

- cpufreq_freq_transition_begin(policy, &freqs);
-
pr_debug("Setting to FSB:%dMHz Mult:%d.%dx (%s)\n",
fsb, mult/10, mult%10, print_speed(speed/1000));
retry_loop:
@@ -385,12 +385,14 @@ retry_loop:
goto retry_loop;
}
}
- /* Report true CPU frequency */
- cpufreq_freq_transition_end(policy, &freqs, 0);

- if (!bm_timeout)
+ if (!bm_timeout) {
printk(KERN_INFO PFX "Warning: Timeout while waiting for "
"idle PCI bus.\n");
+ return -EBUSY;
+ }
+
+ return 0;
}

/*
@@ -631,9 +633,10 @@ static int longhaul_target(struct cpufreq_policy *policy,
unsigned int i;
unsigned int dir = 0;
u8 vid, current_vid;
+ int retval = 0;

if (!can_scale_voltage)
- longhaul_setstate(policy, table_index);
+ retval = longhaul_setstate(policy, table_index);
else {
/* On test system voltage transitions exceeding single
* step up or down were turning motherboard off. Both
@@ -648,7 +651,7 @@ static int longhaul_target(struct cpufreq_policy *policy,
while (i != table_index) {
vid = (longhaul_table[i].driver_data >> 8) & 0x1f;
if (vid != current_vid) {
- longhaul_setstate(policy, i);
+ retval = longhaul_setstate(policy, i);
current_vid = vid;
msleep(200);
}
@@ -657,10 +660,11 @@ static int longhaul_target(struct cpufreq_policy *policy,
else
i--;
}
- longhaul_setstate(policy, table_index);
+ retval = longhaul_setstate(policy, table_index);
}
+
longhaul_index = table_index;
- return 0;
+ return retval;
}


@@ -968,7 +972,15 @@ static void __exit longhaul_exit(void)

for (i = 0; i < numscales; i++) {
if (mults[i] == maxmult) {
+ struct cpufreq_freqs freqs;
+
+ freqs.old = policy->cur;
+ freqs.new = longhaul_table[i].frequency;
+ freqs.flags = 0;
+
+ cpufreq_freq_transition_begin(policy, &freqs);
longhaul_setstate(policy, i);
+ cpufreq_freq_transition_end(policy, &freqs, 0);
break;
}
}

2014-04-28 20:57:40

by Meelis Roos

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Cpufreq frequency serialization fixes

> Meelis Roos reported hangs during boot in the longhaul cpufreq driver, after
> commit 12478cf0c55 (cpufreq: Make sure frequency transitions are serialized).
> The root-cause of this issue is the extra invocation of the
> cpufreq_freq_transition_begin() and cpufreq_freq_transition_end() APIs in the
> longhaul driver. I found similar issues in the powernow-k6 and powernow-k7
> drivers as well. This patchset fixes the issue in all the 3 drivers and also
> adds a debug infrastructure to catch such issues easily.
>
> Patches 1-4 fix the regression in longhaul, powernow-k6 and powernow-k7
> drivers. (Patch 2 fixes a different bug in powernow-k6, and it is kept as a
> separate patch instead of merging it with patch 3, because I felt that it was
> a bit subtle and needed attention in a separate patch).
>
> Patch 5 adds a debug infrastructure to the cpufreq core to catch such problems
> more easily in the future.
>
>
> Changes in v2:
> --------------
>
> * Modified patch 1 to take error returns into account, as pointed out by
> Viresh.
> * Added patch 2 to fix the existing issue in the powernow-k6 driver, pointed
> out by Viresh.
> * Added patch 5 to introduce a debug infrastructure to catch such issues
> easily.

This also works on my VIA EPIA.

--
Meelis Roos ([email protected])

2014-04-28 23:31:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Cpufreq frequency serialization fixes

On Tuesday, April 29, 2014 12:23:54 AM Srivatsa S. Bhat wrote:
>
> Hi,
>
> Meelis Roos reported hangs during boot in the longhaul cpufreq driver, after
> commit 12478cf0c55 (cpufreq: Make sure frequency transitions are serialized).
> The root-cause of this issue is the extra invocation of the
> cpufreq_freq_transition_begin() and cpufreq_freq_transition_end() APIs in the
> longhaul driver. I found similar issues in the powernow-k6 and powernow-k7
> drivers as well. This patchset fixes the issue in all the 3 drivers and also
> adds a debug infrastructure to catch such issues easily.
>
> Patches 1-4 fix the regression in longhaul, powernow-k6 and powernow-k7
> drivers. (Patch 2 fixes a different bug in powernow-k6, and it is kept as a
> separate patch instead of merging it with patch 3, because I felt that it was
> a bit subtle and needed attention in a separate patch).
>
> Patch 5 adds a debug infrastructure to the cpufreq core to catch such problems
> more easily in the future.

I've queued up patches [1-4/5] for 3.15, the last one I need to have another
look at tomorrow.

BTW, when you fix regressions, please always add a Fixes: tag to the changelog.

Thanks!


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-04-29 04:51:13

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

Nice effort.

On 29 April 2014 00:25, Srivatsa S. Bhat
<[email protected]> wrote:
> Now all such drivers have been fixed, but debugging this issue was not
> very straight-forward (even lockdep didn't catch this). So let us add a
> debug infrastructure to the cpufreq core to catch such issues more easily
> in the future.

BUT, I am not sure if we really need it :(

I think we just got into the 'barrier' stuff as we had some doubts about it
earlier and were quite sure that nothing else could go wrong. Otherwise
the only problem could have been present was the second queuing
from the same thread. And we will surely get stuck if that happens and
we can't just miss that error..

> Scenario 1: (Deadlock-free)
> ----------
>
> Task A Task B
>
> /* 1st freq transition */
> Invoke _begin() {
> ...
> ...
> }
>
> Change the frequency
>
> Got interrupt for successful
> change of frequency.
>
> /* 1st freq transition */
> Invoke _end() {
> ...
> ...
> /* 2nd freq transition */ ...
> Invoke _begin() { ...
> ... //waiting for B ...
> ... //to finish _end() }
> ...
> ...
> }
>
>
> This scenario is actually deadlock-free because Task A can wait inside the
> second call to _begin() without self-deadlocking, because it is the
> responsibility of Task B to finish the first sequence by invoking the
> corresponding _end().
>
> By setting the value of 'transition_task' again explicitly in _end(), we
> ensure that the code won't print a false-positive warning in this case.
>
> However the same code successfully catches the following deadlock-prone
> scenario even in ASYNC_NOTIFICATION drivers:
>
> Scenario 2: (Deadlock-prone)
> ----------
>
> Task A Task B
>
> /* 1st freq transition */
> Invoke _begin() {
> ...
> ...
> }
>
> /* 2nd freq transition */
> Invoke _begin() {
> ...
> ...
> }
>
> Change the frequency
>
>
> Here the bug is that Task A called the second _begin() *before* actually
> performing the 1st frequency transition. In other words, it failed to set
> Task B in motion for the 1st frequency transition, and hence it will
> self-deadlock. This is very similar to the case of drivers which do
> synchronous notification, and hence the debug infrastructure developed
> in this patch can catch this scenario easily.
>
> Signed-off-by: Srivatsa S. Bhat <[email protected]>
> ---
>
> drivers/cpufreq/cpufreq.c | 12 ++++++++++++
> include/linux/cpufreq.h | 1 +
> 2 files changed, 13 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index abda660..2c99a6c 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -354,6 +354,10 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
> void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
> struct cpufreq_freqs *freqs)
> {
> +
> + /* Catch double invocations of _begin() which lead to self-deadlock */
> + WARN_ON(current == policy->transition_task);
> +
> wait:
> wait_event(policy->transition_wait, !policy->transition_ongoing);
>
> @@ -365,6 +369,7 @@ wait:
> }
>
> policy->transition_ongoing = true;
> + policy->transition_task = current;
>
> spin_unlock(&policy->transition_lock);
>
> @@ -378,9 +383,16 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
> if (unlikely(WARN_ON(!policy->transition_ongoing)))
> return;
>
> + /*
> + * The task invoking _end() could be different from the one that
> + * invoked the _begin(). So set ->transition_task again here
> + * explicity.
> + */
> + policy->transition_task = current;
> cpufreq_notify_post_transition(policy, freqs, transition_failed);
>
> policy->transition_ongoing = false;
> + policy->transition_task = NULL;
>
> wake_up(&policy->transition_wait);
> }
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 5ae5100..8f44d79 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -110,6 +110,7 @@ struct cpufreq_policy {
> bool transition_ongoing; /* Tracks transition status */
> spinlock_t transition_lock;
> wait_queue_head_t transition_wait;
> + struct task_struct *transition_task; /* Task which is doing the transition */
> };
>
> /* Only for ACPI */
>

2014-04-29 04:55:46

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 29 April 2014 10:21, Viresh Kumar <[email protected]> wrote:
> Nice effort.
>
> On 29 April 2014 00:25, Srivatsa S. Bhat
> <[email protected]> wrote:
>> Now all such drivers have been fixed, but debugging this issue was not
>> very straight-forward (even lockdep didn't catch this). So let us add a
>> debug infrastructure to the cpufreq core to catch such issues more easily
>> in the future.
>
> BUT, I am not sure if we really need it :(
>
> I think we just got into the 'barrier' stuff as we had some doubts about it
> earlier and were quite sure that nothing else could go wrong. Otherwise
> the only problem could have been present was the second queuing
> from the same thread. And we will surely get stuck if that happens and
> we can't just miss that error..
>
>> Scenario 1: (Deadlock-free)
>> ----------
>>
>> Task A Task B
>>
>> /* 1st freq transition */
>> Invoke _begin() {
>> ...
>> ...
>> }
>>
>> Change the frequency
>>
>> Got interrupt for successful
>> change of frequency.
>>
>> /* 1st freq transition */
>> Invoke _end() {
>> ...
>> ...
>> /* 2nd freq transition */ ...
>> Invoke _begin() { ...
>> ... //waiting for B ...
>> ... //to finish _end() }
>> ...
>> ...
>> }
>>
>>
>> This scenario is actually deadlock-free because Task A can wait inside the
>> second call to _begin() without self-deadlocking, because it is the
>> responsibility of Task B to finish the first sequence by invoking the
>> corresponding _end().

WTF, I was writing my mail and it just got send due to some stupid combination
of keys :( .. Sorry.

Also, this might not work as expected. Consider this scenario:

/* 1st freq transition */
Invoke _begin() {
...
...
}

Start Change of frequency and return
back as there is no end from same thread.

/* 2nd freq transition as there is nobody stopping us */
Invoke _begin() {
... //waiting for B
... //to finish _end()
...
...
}

Got
interrupt for successful
change
of frequency.

/* 1st
freq transition */
Invoke _end() {
...
...
}

And your patch will probably break this ?

2014-04-29 06:00:29

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Cpufreq frequency serialization fixes

On 04/29/2014 05:17 AM, Rafael J. Wysocki wrote:
> On Tuesday, April 29, 2014 12:23:54 AM Srivatsa S. Bhat wrote:
>>
>> Hi,
>>
>> Meelis Roos reported hangs during boot in the longhaul cpufreq driver, after
>> commit 12478cf0c55 (cpufreq: Make sure frequency transitions are serialized).
>> The root-cause of this issue is the extra invocation of the
>> cpufreq_freq_transition_begin() and cpufreq_freq_transition_end() APIs in the
>> longhaul driver. I found similar issues in the powernow-k6 and powernow-k7
>> drivers as well. This patchset fixes the issue in all the 3 drivers and also
>> adds a debug infrastructure to catch such issues easily.
>>
>> Patches 1-4 fix the regression in longhaul, powernow-k6 and powernow-k7
>> drivers. (Patch 2 fixes a different bug in powernow-k6, and it is kept as a
>> separate patch instead of merging it with patch 3, because I felt that it was
>> a bit subtle and needed attention in a separate patch).
>>
>> Patch 5 adds a debug infrastructure to the cpufreq core to catch such problems
>> more easily in the future.
>
> I've queued up patches [1-4/5] for 3.15, the last one I need to have another
> look at tomorrow.

Great! Thanks a lot!

>
> BTW, when you fix regressions, please always add a Fixes: tag to the changelog.
>

Oh, ok, will keep that in mind from next time. Thank you!

Regards,
Srivatsa S. Bhat

2014-04-29 06:09:22

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 04/29/2014 10:21 AM, Viresh Kumar wrote:
> Nice effort.
>

Thanks! :-)

> On 29 April 2014 00:25, Srivatsa S. Bhat
> <[email protected]> wrote:
>> Now all such drivers have been fixed, but debugging this issue was not
>> very straight-forward (even lockdep didn't catch this). So let us add a
>> debug infrastructure to the cpufreq core to catch such issues more easily
>> in the future.
>
> BUT, I am not sure if we really need it :(
>
> I think we just got into the 'barrier' stuff as we had some doubts about it
> earlier and were quite sure that nothing else could go wrong. Otherwise
> the only problem could have been present was the second queuing
> from the same thread. And we will surely get stuck if that happens and
> we can't just miss that error..
>

Yeah, and we _did_ hit that hang, but it was not at all intuitive at first
as to what was going wrong. Worse, even lockdep is not in a position to catch
such scenarios. So it definitely doesn't hurt to add a small infrastructure
to catch such issues in the future, IMHO.

Besides, if we can add features for users, surely we can also add some
non-intrusive debug code for ourselves too, to make our lives easier, right? :-)
I'm sure we deserve that privilege ;-)

Regards,
Srivatsa S. Bhat

>> Scenario 1: (Deadlock-free)
>> ----------
>>
>> Task A Task B
>>
>> /* 1st freq transition */
>> Invoke _begin() {
>> ...
>> ...
>> }
>>
>> Change the frequency
>>
>> Got interrupt for successful
>> change of frequency.
>>
>> /* 1st freq transition */
>> Invoke _end() {
>> ...
>> ...
>> /* 2nd freq transition */ ...
>> Invoke _begin() { ...
>> ... //waiting for B ...
>> ... //to finish _end() }
>> ...
>> ...
>> }
>>
>>
>> This scenario is actually deadlock-free because Task A can wait inside the
>> second call to _begin() without self-deadlocking, because it is the
>> responsibility of Task B to finish the first sequence by invoking the
>> corresponding _end().
>>
>> By setting the value of 'transition_task' again explicitly in _end(), we
>> ensure that the code won't print a false-positive warning in this case.
>>
>> However the same code successfully catches the following deadlock-prone
>> scenario even in ASYNC_NOTIFICATION drivers:
>>
>> Scenario 2: (Deadlock-prone)
>> ----------
>>
>> Task A Task B
>>
>> /* 1st freq transition */
>> Invoke _begin() {
>> ...
>> ...
>> }
>>
>> /* 2nd freq transition */
>> Invoke _begin() {
>> ...
>> ...
>> }
>>
>> Change the frequency
>>
>>
>> Here the bug is that Task A called the second _begin() *before* actually
>> performing the 1st frequency transition. In other words, it failed to set
>> Task B in motion for the 1st frequency transition, and hence it will
>> self-deadlock. This is very similar to the case of drivers which do
>> synchronous notification, and hence the debug infrastructure developed
>> in this patch can catch this scenario easily.
>>
>> Signed-off-by: Srivatsa S. Bhat <[email protected]>
>> ---
>>
>> drivers/cpufreq/cpufreq.c | 12 ++++++++++++
>> include/linux/cpufreq.h | 1 +
>> 2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index abda660..2c99a6c 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -354,6 +354,10 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
>> void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
>> struct cpufreq_freqs *freqs)
>> {
>> +
>> + /* Catch double invocations of _begin() which lead to self-deadlock */
>> + WARN_ON(current == policy->transition_task);
>> +
>> wait:
>> wait_event(policy->transition_wait, !policy->transition_ongoing);
>>
>> @@ -365,6 +369,7 @@ wait:
>> }
>>
>> policy->transition_ongoing = true;
>> + policy->transition_task = current;
>>
>> spin_unlock(&policy->transition_lock);
>>
>> @@ -378,9 +383,16 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
>> if (unlikely(WARN_ON(!policy->transition_ongoing)))
>> return;
>>
>> + /*
>> + * The task invoking _end() could be different from the one that
>> + * invoked the _begin(). So set ->transition_task again here
>> + * explicity.
>> + */
>> + policy->transition_task = current;
>> cpufreq_notify_post_transition(policy, freqs, transition_failed);
>>
>> policy->transition_ongoing = false;
>> + policy->transition_task = NULL;
>>
>> wake_up(&policy->transition_wait);
>> }
>> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>> index 5ae5100..8f44d79 100644
>> --- a/include/linux/cpufreq.h
>> +++ b/include/linux/cpufreq.h
>> @@ -110,6 +110,7 @@ struct cpufreq_policy {
>> bool transition_ongoing; /* Tracks transition status */
>> spinlock_t transition_lock;
>> wait_queue_head_t transition_wait;
>> + struct task_struct *transition_task; /* Task which is doing the transition */
>> };
>>
>> /* Only for ACPI */
>>

2014-04-29 06:17:14

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 04/29/2014 10:25 AM, Viresh Kumar wrote:
> On 29 April 2014 10:21, Viresh Kumar <[email protected]> wrote:
>> Nice effort.
>>
>> On 29 April 2014 00:25, Srivatsa S. Bhat
>> <[email protected]> wrote:
>>> Now all such drivers have been fixed, but debugging this issue was not
>>> very straight-forward (even lockdep didn't catch this). So let us add a
>>> debug infrastructure to the cpufreq core to catch such issues more easily
>>> in the future.
>>
>> BUT, I am not sure if we really need it :(
>>
>> I think we just got into the 'barrier' stuff as we had some doubts about it
>> earlier and were quite sure that nothing else could go wrong. Otherwise
>> the only problem could have been present was the second queuing
>> from the same thread. And we will surely get stuck if that happens and
>> we can't just miss that error..
>>
>>> Scenario 1: (Deadlock-free)
>>> ----------
>>>
>>> Task A Task B
>>>
>>> /* 1st freq transition */
>>> Invoke _begin() {
>>> ...
>>> ...
>>> }
>>>
>>> Change the frequency
>>>
>>> Got interrupt for successful
>>> change of frequency.
>>>
>>> /* 1st freq transition */
>>> Invoke _end() {
>>> ...
>>> ...
>>> /* 2nd freq transition */ ...
>>> Invoke _begin() { ...
>>> ... //waiting for B ...
>>> ... //to finish _end() }
>>> ...
>>> ...
>>> }
>>>
>>>
>>> This scenario is actually deadlock-free because Task A can wait inside the
>>> second call to _begin() without self-deadlocking, because it is the
>>> responsibility of Task B to finish the first sequence by invoking the
>>> corresponding _end().
>
> WTF, I was writing my mail and it just got send due to some stupid combination
> of keys :( .. Sorry.
>

No problem!

> Also, this might not work as expected. Consider this scenario:
>
> /* 1st freq transition */
> Invoke _begin() {
> ...
> ...
> }
>
> Start Change of frequency and return
> back as there is no end from same thread.
>
> /* 2nd freq transition as there is nobody stopping us */
> Invoke _begin() {
> ... //waiting for B
> ... //to finish _end()
> ...
> ...
> }
>
> Got
> interrupt for successful
> change
> of frequency.
>
> /* 1st
> freq transition */
> Invoke _end() {
> ...
> ...
> }
>
> And your patch will probably break this ?

Yes, I'm aware that this corner case doesn't work well with my debug
patch. I tried to avoid this but couldn't think of any solution.
(One big-hammer way to avoid this is to exclude this infrastructure
for all ASYNC_NOTIFICATION drivers, but I didn't want to go with that
approach, since it makes it look ugly). Do you have any better ideas
to deal with this scenario?

Also, do we really have cases in mind where a single thread does
multiple frequency transitions in one go? That too in such quick
successions? Echo's to sysfs, changing of governors from userspace etc
all do one frequency transition at a time per-task...


Regards,
Srivatsa S. Bhat

2014-04-29 06:49:42

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 29 April 2014 11:46, Srivatsa S. Bhat
<[email protected]> wrote:
> Yes, I'm aware that this corner case doesn't work well with my debug

Don't know if its a corner case, it may be the most obvious case for
some :)

> patch. I tried to avoid this but couldn't think of any solution.

The problem is not that it wouldn't work for these systems, but we will
get WARN_ON() when they shouldn't have come :)

> (One big-hammer way to avoid this is to exclude this infrastructure
> for all ASYNC_NOTIFICATION drivers, but I didn't want to go with that
> approach, since it makes it look ugly). Do you have any better ideas
> to deal with this scenario?

Can't think of anything better than this:

+ WARN_ON(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
&& (current == policy->transition_task));

which you already mentioned.

> Also, do we really have cases in mind where a single thread does
> multiple frequency transitions in one go? That too in such quick
> successions? Echo's to sysfs, changing of governors from userspace etc
> all do one frequency transition at a time per-task...

Its not really about if we can think of a real use case or not. The point
is, governor is free to call transition calls one after the other (will always
happen from a single thread) and it isn't supposed to wait for drivers
to finish earlier transitions as ->target() has already returned.

--
viresh

2014-04-29 07:36:52

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 04/29/2014 12:19 PM, Viresh Kumar wrote:
> On 29 April 2014 11:46, Srivatsa S. Bhat
> <[email protected]> wrote:
>> Yes, I'm aware that this corner case doesn't work well with my debug
>
> Don't know if its a corner case, it may be the most obvious case for
> some :)
>

Yeah, it could be.

>> patch. I tried to avoid this but couldn't think of any solution.
>
> The problem is not that it wouldn't work for these systems, but we will
> get WARN_ON() when they shouldn't have come :)
>

Yes, I thought about this, and I agree that this is not acceptable.

>> (One big-hammer way to avoid this is to exclude this infrastructure
>> for all ASYNC_NOTIFICATION drivers, but I didn't want to go with that
>> approach, since it makes it look ugly). Do you have any better ideas
>> to deal with this scenario?
>
> Can't think of anything better than this:
>
> + WARN_ON(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
> && (current == policy->transition_task));
>
> which you already mentioned.

Yeah, I think we should just go with this. I thought we needed lots of
if-conditions to do exclude these drivers (which would have made it ugly),
but as you pointed above, just this one would suffice.

Besides, the cpufreq core doesn't automatically invoke _begin() and
_end() for ASYNC_NOTIFICATION drivers. So that means the probability
that such drivers will hit this problem is extremely low, since the
driver alone is responsible for invoking _begin/_end and hence there
shouldn't be much of a conflict. So I think we should really just
skip ASYNC_NOTIFICATION drivers in this debug infrastructure.

>
>> Also, do we really have cases in mind where a single thread does
>> multiple frequency transitions in one go? That too in such quick
>> successions? Echo's to sysfs, changing of governors from userspace etc
>> all do one frequency transition at a time per-task...
>
> Its not really about if we can think of a real use case or not. The point
> is, governor is free to call transition calls one after the other (will always
> happen from a single thread) and it isn't supposed to wait for drivers
> to finish earlier transitions as ->target() has already returned.
>

Yes, I agree now. Making bold assumptions in the cpufreq core about
how many frequency transitions a single task will do etc is potentially
*very* dangerous. Let's not do it that way.

I'll send a v2 excluding the ASYNC_NOTIFICATION drivers.
Thanks a lot for your inputs!

Regards,
Srivatsa S. Bhat

2014-04-29 08:04:37

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 29 April 2014 13:05, Srivatsa S. Bhat
<[email protected]> wrote:
> On 04/29/2014 12:19 PM, Viresh Kumar wrote:
>> + WARN_ON(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
>> && (current == policy->transition_task));
>>
>> which you already mentioned.
>
> Yeah, I think we should just go with this. I thought we needed lots of
> if-conditions to do exclude these drivers (which would have made it ugly),
> but as you pointed above, just this one would suffice.

Okay, I think we can do one more modification here:

>> + WARN_ON(unlikely(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
>> && (current == policy->transition_task)));


> Besides, the cpufreq core doesn't automatically invoke _begin() and
> _end() for ASYNC_NOTIFICATION drivers. So that means the probability
> that such drivers will hit this problem is extremely low, since the
> driver alone is responsible for invoking _begin/_end and hence there
> shouldn't be much of a conflict. So I think we should really just
> skip ASYNC_NOTIFICATION drivers in this debug infrastructure.

The only way it can happen (I don't hope somebody would be so
stupid to call begin twice from target() :)), is via transition notifiers,
which in some case starting a new transition..

2014-04-29 08:11:39

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

On 04/29/2014 01:34 PM, Viresh Kumar wrote:
> On 29 April 2014 13:05, Srivatsa S. Bhat
> <[email protected]> wrote:
>> On 04/29/2014 12:19 PM, Viresh Kumar wrote:
>>> + WARN_ON(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
>>> && (current == policy->transition_task));
>>>
>>> which you already mentioned.
>>
>> Yeah, I think we should just go with this. I thought we needed lots of
>> if-conditions to do exclude these drivers (which would have made it ugly),
>> but as you pointed above, just this one would suffice.
>
> Okay, I think we can do one more modification here:
>
>>> + WARN_ON(unlikely(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
>>> && (current == policy->transition_task)));
>

WARN_ON and friends already wrap their arguments within unlikely().
So we don't need to add it explicitly.

>
>> Besides, the cpufreq core doesn't automatically invoke _begin() and
>> _end() for ASYNC_NOTIFICATION drivers. So that means the probability
>> that such drivers will hit this problem is extremely low, since the
>> driver alone is responsible for invoking _begin/_end and hence there
>> shouldn't be much of a conflict. So I think we should really just
>> skip ASYNC_NOTIFICATION drivers in this debug infrastructure.
>
> The only way it can happen (I don't hope somebody would be so
> stupid to call begin twice from target() :)), is via transition notifiers,
> which in some case starting a new transition..

Hmm..

Regards,
Srivatsa S. Bhat