2013-08-20 05:23:15

by Neil Zhang

[permalink] [raw]
Subject: [PATCH] cpuidle: coupled: fix dead loop corner case

There is a corener case when no peripheral irqs route to secondary
cores.
Let's take dual core system for example, the sequence is as following:

Core 0 Core1
1. set waiting bit and enter waiting loop
2. set waiting bit and poke core1
3. clear poke in irq and enter safe state
4. set ready bit and enter ready loop

Since there is no peripheral irq route to core 1, so it will stay in
safe state forever, and core 0 will dead loop in the following code.
while (!cpuidle_coupled_cpus_ready(coupled)) {
/* Check if any other cpus bailed out of idle. */
if (!cpuidle_coupled_cpus_waiting(coupled))
}

The solution is don't let secondary core enter safe state when it has
already handled the poke interrupt.

Signed-off-by: Neil Zhang <[email protected]>
Reviewed-by: Fangsuo Wu <[email protected]>
---
drivers/cpuidle/coupled.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index 2a297f8..a37c718 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -119,6 +119,7 @@ struct cpuidle_coupled {
#define CPUIDLE_COUPLED_NOT_IDLE (-1)

static DEFINE_MUTEX(cpuidle_coupled_lock);
+static DEFINE_PER_CPU(bool, poke_sync);
static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);

/*
@@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info)
{
int cpu = (unsigned long)info;
cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
+ __this_cpu_write(poke_sync, true);
}

/**
@@ -473,6 +475,7 @@ retry:
* allowed for a single cpu.
*/
while (!cpuidle_coupled_cpus_waiting(coupled)) {
+ __this_cpu_write(poke_sync, false);
if (cpuidle_coupled_clear_pokes(dev->cpu)) {
cpuidle_coupled_set_not_waiting(dev->cpu, coupled);
goto out;
@@ -483,6 +486,10 @@ retry:
goto out;
}

+ if (cpuidle_coupled_cpus_waiting(coupled)
+ && __this_cpu_read(poke_sync))
+ break;
+
entered_state = cpuidle_enter_state(dev, drv,
dev->safe_state_index);
}
--
1.7.4.1


2013-08-20 12:26:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: coupled: fix dead loop corner case

On Tuesday, August 20, 2013 01:17:44 PM Neil Zhang wrote:
> There is a corener case when no peripheral irqs route to secondary
> cores.
> Let's take dual core system for example, the sequence is as following:
>
> Core 0 Core1
> 1. set waiting bit and enter waiting loop
> 2. set waiting bit and poke core1
> 3. clear poke in irq and enter safe state
> 4. set ready bit and enter ready loop
>
> Since there is no peripheral irq route to core 1, so it will stay in
> safe state forever, and core 0 will dead loop in the following code.
> while (!cpuidle_coupled_cpus_ready(coupled)) {
> /* Check if any other cpus bailed out of idle. */
> if (!cpuidle_coupled_cpus_waiting(coupled))
> }
>
> The solution is don't let secondary core enter safe state when it has
> already handled the poke interrupt.
>
> Signed-off-by: Neil Zhang <[email protected]>
> Reviewed-by: Fangsuo Wu <[email protected]>

Daniel, can you please have a look at this?

Rafael


> ---
> drivers/cpuidle/coupled.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> index 2a297f8..a37c718 100644
> --- a/drivers/cpuidle/coupled.c
> +++ b/drivers/cpuidle/coupled.c
> @@ -119,6 +119,7 @@ struct cpuidle_coupled {
> #define CPUIDLE_COUPLED_NOT_IDLE (-1)
>
> static DEFINE_MUTEX(cpuidle_coupled_lock);
> +static DEFINE_PER_CPU(bool, poke_sync);
> static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);
>
> /*
> @@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info)
> {
> int cpu = (unsigned long)info;
> cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
> + __this_cpu_write(poke_sync, true);
> }
>
> /**
> @@ -473,6 +475,7 @@ retry:
> * allowed for a single cpu.
> */
> while (!cpuidle_coupled_cpus_waiting(coupled)) {
> + __this_cpu_write(poke_sync, false);
> if (cpuidle_coupled_clear_pokes(dev->cpu)) {
> cpuidle_coupled_set_not_waiting(dev->cpu, coupled);
> goto out;
> @@ -483,6 +486,10 @@ retry:
> goto out;
> }
>
> + if (cpuidle_coupled_cpus_waiting(coupled)
> + && __this_cpu_read(poke_sync))
> + break;
> +
> entered_state = cpuidle_enter_state(dev, drv,
> dev->safe_state_index);
> }
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-08-22 10:12:05

by Neil Zhang

[permalink] [raw]
Subject: RE: [PATCH] cpuidle: coupled: fix dead loop corner case

Daniel & Colin,

> -----Original Message-----
> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: 2013年8月20日 20:37
> To: Neil Zhang; Daniel Lezcano
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH] cpuidle: coupled: fix dead loop corner case
>
> On Tuesday, August 20, 2013 01:17:44 PM Neil Zhang wrote:
> > There is a corener case when no peripheral irqs route to secondary
> > cores.
> > Let's take dual core system for example, the sequence is as following:
> >
> > Core 0 Core1
> > 1. set waiting bit and enter waiting loop
> > 2. set waiting bit and poke core1
> > 3. clear poke in irq and enter safe state
> > 4. set ready bit and enter ready loop
> >
> > Since there is no peripheral irq route to core 1, so it will stay in
> > safe state forever, and core 0 will dead loop in the following code.
> > while (!cpuidle_coupled_cpus_ready(coupled)) {
> > /* Check if any other cpus bailed out of idle. */
> > if (!cpuidle_coupled_cpus_waiting(coupled))
> > }
> >
> > The solution is don't let secondary core enter safe state when it has
> > already handled the poke interrupt.
> >
> > Signed-off-by: Neil Zhang <[email protected]>
> > Reviewed-by: Fangsuo Wu <[email protected]>
>
> Daniel, can you please have a look at this?
>
> Rafael
>

What's your opinion?
Thanks.

>
> > ---
> > drivers/cpuidle/coupled.c | 7 +++++++
> > 1 files changed, 7 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> > index 2a297f8..a37c718 100644
> > --- a/drivers/cpuidle/coupled.c
> > +++ b/drivers/cpuidle/coupled.c
> > @@ -119,6 +119,7 @@ struct cpuidle_coupled {
> > #define CPUIDLE_COUPLED_NOT_IDLE (-1)
> >
> > static DEFINE_MUTEX(cpuidle_coupled_lock);
> > +static DEFINE_PER_CPU(bool, poke_sync);
> > static DEFINE_PER_CPU(struct call_single_data,
> > cpuidle_coupled_poke_cb);
> >
> > /*
> > @@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info) {
> > int cpu = (unsigned long)info;
> > cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
> > + __this_cpu_write(poke_sync, true);
> > }
> >
> > /**
> > @@ -473,6 +475,7 @@ retry:
> > * allowed for a single cpu.
> > */
> > while (!cpuidle_coupled_cpus_waiting(coupled)) {
> > + __this_cpu_write(poke_sync, false);
> > if (cpuidle_coupled_clear_pokes(dev->cpu)) {
> > cpuidle_coupled_set_not_waiting(dev->cpu, coupled);
> > goto out;
> > @@ -483,6 +486,10 @@ retry:
> > goto out;
> > }
> >
> > + if (cpuidle_coupled_cpus_waiting(coupled)
> > + && __this_cpu_read(poke_sync))
> > + break;
> > +
> > entered_state = cpuidle_enter_state(dev, drv,
> > dev->safe_state_index);
> > }
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

Best Regards,
Neil Zhang
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-08-22 21:08:17

by Colin Cross

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: coupled: fix dead loop corner case

On Mon, Aug 19, 2013 at 10:17 PM, Neil Zhang <[email protected]> wrote:
> There is a corener case when no peripheral irqs route to secondary
> cores.
> Let's take dual core system for example, the sequence is as following:
>
> Core 0 Core1
> 1. set waiting bit and enter waiting loop
> 2. set waiting bit and poke core1
> 3. clear poke in irq and enter safe state
> 4. set ready bit and enter ready loop
>
> Since there is no peripheral irq route to core 1, so it will stay in
> safe state forever, and core 0 will dead loop in the following code.
> while (!cpuidle_coupled_cpus_ready(coupled)) {
> /* Check if any other cpus bailed out of idle. */
> if (!cpuidle_coupled_cpus_waiting(coupled))
> }
>
> The solution is don't let secondary core enter safe state when it has
> already handled the poke interrupt.
>
> Signed-off-by: Neil Zhang <[email protected]>
> Reviewed-by: Fangsuo Wu <[email protected]>
> ---
> drivers/cpuidle/coupled.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> index 2a297f8..a37c718 100644
> --- a/drivers/cpuidle/coupled.c
> +++ b/drivers/cpuidle/coupled.c
> @@ -119,6 +119,7 @@ struct cpuidle_coupled {
> #define CPUIDLE_COUPLED_NOT_IDLE (-1)
>
> static DEFINE_MUTEX(cpuidle_coupled_lock);
> +static DEFINE_PER_CPU(bool, poke_sync);
> static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);
>
> /*
> @@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info)
> {
> int cpu = (unsigned long)info;
> cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
> + __this_cpu_write(poke_sync, true);
> }
>
> /**
> @@ -473,6 +475,7 @@ retry:
> * allowed for a single cpu.
> */
> while (!cpuidle_coupled_cpus_waiting(coupled)) {
> + __this_cpu_write(poke_sync, false);
> if (cpuidle_coupled_clear_pokes(dev->cpu)) {
> cpuidle_coupled_set_not_waiting(dev->cpu, coupled);
> goto out;
> @@ -483,6 +486,10 @@ retry:
> goto out;
> }
>
> + if (cpuidle_coupled_cpus_waiting(coupled)
> + && __this_cpu_read(poke_sync))
> + break;
> +
> entered_state = cpuidle_enter_state(dev, drv,
> dev->safe_state_index);
> }
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

I have a similar patch that avoids adding another check for
cpuidle_coupled_cpus_waiting, and uses the return value from
cpuidle_coupled_clear_pokes instead of adding a percpu bool. I will
post it shortly.

Do you have a test case that can reproduce this easily?

2013-08-23 03:20:56

by Neil Zhang

[permalink] [raw]
Subject: RE: [PATCH] cpuidle: coupled: fix dead loop corner case


> -----Original Message-----
> From: Colin Cross [mailto:[email protected]]
> Sent: 2013??8??23?? 5:08
> To: Neil Zhang
> Cc: Rafael J. Wysocki; Daniel Lezcano; Linux PM list; lkml
> Subject: Re: [PATCH] cpuidle: coupled: fix dead loop corner case
>
> On Mon, Aug 19, 2013 at 10:17 PM, Neil Zhang <[email protected]>
> wrote:
> > There is a corener case when no peripheral irqs route to secondary
> > cores.
> > Let's take dual core system for example, the sequence is as following:
> >
> > Core 0 Core1
> > 1. set waiting bit and enter waiting
> loop
> > 2. set waiting bit and poke core1
> > 3. clear poke in irq and enter safe
> state
> > 4. set ready bit and enter ready loop
> >
> > Since there is no peripheral irq route to core 1, so it will stay in
> > safe state forever, and core 0 will dead loop in the following code.
> > while (!cpuidle_coupled_cpus_ready(coupled)) {
> > /* Check if any other cpus bailed out of idle. */
> > if (!cpuidle_coupled_cpus_waiting(coupled))
> > }
> >
> > The solution is don't let secondary core enter safe state when it has
> > already handled the poke interrupt.
> >
> > Signed-off-by: Neil Zhang <[email protected]>
> > Reviewed-by: Fangsuo Wu <[email protected]>
> > ---
> > drivers/cpuidle/coupled.c | 7 +++++++
> > 1 files changed, 7 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> > index 2a297f8..a37c718 100644
> > --- a/drivers/cpuidle/coupled.c
> > +++ b/drivers/cpuidle/coupled.c
> > @@ -119,6 +119,7 @@ struct cpuidle_coupled {
> > #define CPUIDLE_COUPLED_NOT_IDLE (-1)
> >
> > static DEFINE_MUTEX(cpuidle_coupled_lock);
> > +static DEFINE_PER_CPU(bool, poke_sync);
> > static DEFINE_PER_CPU(struct call_single_data,
> > cpuidle_coupled_poke_cb);
> >
> > /*
> > @@ -295,6 +296,7 @@ static void cpuidle_coupled_poked(void *info) {
> > int cpu = (unsigned long)info;
> > cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
> > + __this_cpu_write(poke_sync, true);
> > }
> >
> > /**
> > @@ -473,6 +475,7 @@ retry:
> > * allowed for a single cpu.
> > */
> > while (!cpuidle_coupled_cpus_waiting(coupled)) {
> > + __this_cpu_write(poke_sync, false);
> > if (cpuidle_coupled_clear_pokes(dev->cpu)) {
> > cpuidle_coupled_set_not_waiting(dev->cpu,
> coupled);
> > goto out;
> > @@ -483,6 +486,10 @@ retry:
> > goto out;
> > }
> >
> > + if (cpuidle_coupled_cpus_waiting(coupled)
> > + && __this_cpu_read(poke_sync))
> > + break;
> > +
> > entered_state = cpuidle_enter_state(dev, drv,
> > dev->safe_state_index);
> > }
> > --
> > 1.7.4.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> I have a similar patch that avoids adding another check for
> cpuidle_coupled_cpus_waiting, and uses the return value from
> cpuidle_coupled_clear_pokes instead of adding a percpu bool. I will post it
> shortly.
>
> Do you have a test case that can reproduce this easily?

It's not easy to reproduce.
We only catch one time till now.

Best Regards,
Neil Zhang
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?