2012-06-22 07:35:27

by Wu Fengguang

[permalink] [raw]
Subject: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

Hi Deepthi,

It seems that commit de08622 triggers a boot hang in my kvm boot test.

kvm -cpu kvm64 -enable-kvm \
-kernel ./gpxelinux.lkrn \
-initrd ./kvm.gpxe \
-m 384M \
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=e1000 \
-net user,vlan=0,hostfwd=tcp::$port-:22 \
-boot order=nc \
-no-reboot \
-watchdog i6300esb \
$(qemu_drives) \
-pidfile $pidfile \
-serial file:$dmesg

tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
head: de08622731417ba429fc470ab8f2d7c260cb5c72
commit: de08622731417ba429fc470ab8f2d7c260cb5c72 [9/9] PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
config: x86_64-allyesdebian (attached as .config)

x86_64-allyesdebian bbbbbbbbbB
--
8: b 0eb8c06 2012-06-19 22:23:37 +0200 PM / Sleep: Separate printing suspend times from initcall_debug
9: B de08622 boot hang after [ 81.370898] usbcore: registered new interface driver dw2102

legend: [b] boot fine [B] boot error

commit de08622731417ba429fc470ab8f2d7c260cb5c72
Author: Deepthi Dharwar <[email protected]>
Date: Wed Jun 13 16:28:17 2012 +0530

PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.

Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
dev->last_residency update to driver enter routine; remove dev->last_state)
was breaking suspend on laptops, as reported in the below link
- https://lkml.org/lkml/2011/11/11/164

This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
(ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
by removing acpi_idle_suspend flag.
- https://lkml.org/lkml/2011/11/14/74

But this fix did not work on all systems
as Suspend/resume regression was reported on Lenovo S10-3
recently by Dave.
- https://lkml.org/lkml/2012/5/27/115
It looked like with commit e978aa7d broke suspend and
with commit 3439a8da resume was not working with acpi_idle driver.

This patch fixes the regression that caused this issue
in the first place. acpi_idle_suspend flag is essential on
some x86 systems to prevent the cpus from going to deeper C-states
when suspend is triggered ( commit b04e7bdb984 )
So reverting the commit 3439a8da is essential.

By default, irqs are disabled in cpu_idle arch specific call
and re-enabled in idle state return path . As the acpi_idle_suspend
flag was being set during suspend, which prevented the cpus
going to deeper idle states, it is essential to
enabling the irqs in its return path too.

To address the suspend issue,
we were not re-enabling the interrupts while returning from
acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
and this was causing suspend failure.

In addition to the above fix, a sanity check has also been added
in x86 arch specific cpu_idle call to ensure that the idle call
always returns with IRQs enabled.

Signed-off-by: Deepthi Dharwar <[email protected]>
Reported-and-Tested-by: Dav Hansen <[email protected]>
Tested-by: Preeti Murthy <[email protected]>
Reviewed-by: Srivatsa S Bhat <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 735279e..8ab76ad 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -460,6 +460,12 @@ void cpu_idle(void)
pm_idle();

rcu_idle_exit();
+
+ /*
+ * Sanity check to ensure that idle call returns
+ * with IRQs enabled
+ */
+ WARN_ON(irqs_disabled());
start_critical_timings();

/* In many cases the interrupt that ended idle
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index f3decb3..c2ffd84 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
/*
* Suspend / resume control
*/
+static int acpi_idle_suspend;
static u32 saved_bm_rld;

static void acpi_idle_bm_rld_save(void)
@@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)

int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
{
+ if (acpi_idle_suspend == 1)
+ return 0;
+
acpi_idle_bm_rld_save();
+ acpi_idle_suspend = 1;
return 0;
}

int acpi_processor_resume(struct acpi_device * device)
{
+ if (acpi_idle_suspend == 0)
+ return 0;
+
acpi_idle_bm_rld_restore();
+ acpi_idle_suspend = 0;
return 0;
}

@@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EINVAL;
+ }
+
lapic_timer_state_broadcast(pr, cx, 1);
kt1 = ktime_get_real();
acpi_idle_do_entry(cx);
@@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EINVAL;
+ }
+
if (cx->entry_method != ACPI_CSTATE_FFH) {
current_thread_info()->status &= ~TS_POLLING;
/*
@@ -901,6 +922,13 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
if (unlikely(!pr))
return -EINVAL;

+ if (acpi_idle_suspend) {
+ if (irqs_disabled())
+ local_irq_enable();
+ cpu_relax();
+ return -EINVAL;
+ }
+
if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
if (drv->safe_state_index >= 0) {
return drv->states[drv->safe_state_index].enter(dev,


Attachments:
(No filename) (5.60 kB)
dmesg-kvm (87.44 kB)
config-3.5.0-rc2+ (129.47 kB)
Download all attachments

2012-06-22 09:57:21

by Deepthi Dharwar

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

Hi,

This is interesting .. suspend fix breaking boot on kvm.

We complied and booted fine with mainline kernels including 3.5-rc2 with
this fix on servers and laptops.
So wondering how different is booting on machines Vs
from kvm ?

Also, are you seeing smooth boot on kvm by reverting only this patch ?

Cheers
Deepthi

On 06/22/2012 01:05 PM, [email protected] wrote:

> Hi Deepthi,
>
> It seems that commit de08622 triggers a boot hang in my kvm boot test.
>
> kvm -cpu kvm64 -enable-kvm \
> -kernel ./gpxelinux.lkrn \
> -initrd ./kvm.gpxe \
> -m 384M \
> -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=e1000 \
> -net user,vlan=0,hostfwd=tcp::$port-:22 \
> -boot order=nc \
> -no-reboot \
> -watchdog i6300esb \
> $(qemu_drives) \
> -pidfile $pidfile \
> -serial file:$dmesg
>
> tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> head: de08622731417ba429fc470ab8f2d7c260cb5c72
> commit: de08622731417ba429fc470ab8f2d7c260cb5c72 [9/9] PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
> config: x86_64-allyesdebian (attached as .config)
>
> x86_64-allyesdebian bbbbbbbbbB
> --
> 8: b 0eb8c06 2012-06-19 22:23:37 +0200 PM / Sleep: Separate printing suspend times from initcall_debug
> 9: B de08622 boot hang after [ 81.370898] usbcore: registered new interface driver dw2102
>
> legend: [b] boot fine [B] boot error
>
> commit de08622731417ba429fc470ab8f2d7c260cb5c72
> Author: Deepthi Dharwar <[email protected]>
> Date: Wed Jun 13 16:28:17 2012 +0530
>
> PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
>
> Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
> dev->last_residency update to driver enter routine; remove dev->last_state)
> was breaking suspend on laptops, as reported in the below link
> - https://lkml.org/lkml/2011/11/11/164
>
> This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
> (ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
> by removing acpi_idle_suspend flag.
> - https://lkml.org/lkml/2011/11/14/74
>
> But this fix did not work on all systems
> as Suspend/resume regression was reported on Lenovo S10-3
> recently by Dave.
> - https://lkml.org/lkml/2012/5/27/115
> It looked like with commit e978aa7d broke suspend and
> with commit 3439a8da resume was not working with acpi_idle driver.
>
> This patch fixes the regression that caused this issue
> in the first place. acpi_idle_suspend flag is essential on
> some x86 systems to prevent the cpus from going to deeper C-states
> when suspend is triggered ( commit b04e7bdb984 )
> So reverting the commit 3439a8da is essential.
>
> By default, irqs are disabled in cpu_idle arch specific call
> and re-enabled in idle state return path . As the acpi_idle_suspend
> flag was being set during suspend, which prevented the cpus
> going to deeper idle states, it is essential to
> enabling the irqs in its return path too.
>
> To address the suspend issue,
> we were not re-enabling the interrupts while returning from
> acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
> and this was causing suspend failure.
>
> In addition to the above fix, a sanity check has also been added
> in x86 arch specific cpu_idle call to ensure that the idle call
> always returns with IRQs enabled.
>
> Signed-off-by: Deepthi Dharwar <[email protected]>
> Reported-and-Tested-by: Dav Hansen <[email protected]>
> Tested-by: Preeti Murthy <[email protected]>
> Reviewed-by: Srivatsa S Bhat <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 735279e..8ab76ad 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -460,6 +460,12 @@ void cpu_idle(void)
> pm_idle();
>
> rcu_idle_exit();
> +
> + /*
> + * Sanity check to ensure that idle call returns
> + * with IRQs enabled
> + */
> + WARN_ON(irqs_disabled());
> start_critical_timings();
>
> /* In many cases the interrupt that ended idle
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index f3decb3..c2ffd84 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
> /*
> * Suspend / resume control
> */
> +static int acpi_idle_suspend;
> static u32 saved_bm_rld;
>
> static void acpi_idle_bm_rld_save(void)
> @@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)
>
> int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
> {
> + if (acpi_idle_suspend == 1)
> + return 0;
> +
> acpi_idle_bm_rld_save();
> + acpi_idle_suspend = 1;
> return 0;
> }
>
> int acpi_processor_resume(struct acpi_device * device)
> {
> + if (acpi_idle_suspend == 0)
> + return 0;
> +
> acpi_idle_bm_rld_restore();
> + acpi_idle_suspend = 0;
> return 0;
> }
>
> @@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EINVAL;
> + }
> +
> lapic_timer_state_broadcast(pr, cx, 1);
> kt1 = ktime_get_real();
> acpi_idle_do_entry(cx);
> @@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EINVAL;
> + }
> +
> if (cx->entry_method != ACPI_CSTATE_FFH) {
> current_thread_info()->status &= ~TS_POLLING;
> /*
> @@ -901,6 +922,13 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
> if (unlikely(!pr))
> return -EINVAL;
>
> + if (acpi_idle_suspend) {
> + if (irqs_disabled())
> + local_irq_enable();
> + cpu_relax();
> + return -EINVAL;
> + }
> +
> if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
> if (drv->safe_state_index >= 0) {
> return drv->states[drv->safe_state_index].enter(dev,

2012-06-22 12:24:58

by Wu Fengguang

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

Hi Deepthi,

I can no longer reproduce this issue... Sorry for the noise!

Thanks,
Fengguang

On Fri, Jun 22, 2012 at 03:26:51PM +0530, Deepthi Dharwar wrote:
> Hi,
>
> This is interesting .. suspend fix breaking boot on kvm.
>
> We complied and booted fine with mainline kernels including 3.5-rc2 with
> this fix on servers and laptops.
> So wondering how different is booting on machines Vs
> from kvm ?
>
> Also, are you seeing smooth boot on kvm by reverting only this patch ?
>
> Cheers
> Deepthi
>
> On 06/22/2012 01:05 PM, [email protected] wrote:
>
> > Hi Deepthi,
> >
> > It seems that commit de08622 triggers a boot hang in my kvm boot test.
> >
> > kvm -cpu kvm64 -enable-kvm \
> > -kernel ./gpxelinux.lkrn \
> > -initrd ./kvm.gpxe \
> > -m 384M \
> > -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=e1000 \
> > -net user,vlan=0,hostfwd=tcp::$port-:22 \
> > -boot order=nc \
> > -no-reboot \
> > -watchdog i6300esb \
> > $(qemu_drives) \
> > -pidfile $pidfile \
> > -serial file:$dmesg
> >
> > tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> > head: de08622731417ba429fc470ab8f2d7c260cb5c72
> > commit: de08622731417ba429fc470ab8f2d7c260cb5c72 [9/9] PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
> > config: x86_64-allyesdebian (attached as .config)
> >
> > x86_64-allyesdebian bbbbbbbbbB
> > --
> > 8: b 0eb8c06 2012-06-19 22:23:37 +0200 PM / Sleep: Separate printing suspend times from initcall_debug
> > 9: B de08622 boot hang after [ 81.370898] usbcore: registered new interface driver dw2102
> >
> > legend: [b] boot fine [B] boot error
> >
> > commit de08622731417ba429fc470ab8f2d7c260cb5c72
> > Author: Deepthi Dharwar <[email protected]>
> > Date: Wed Jun 13 16:28:17 2012 +0530
> >
> > PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
> >
> > Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
> > dev->last_residency update to driver enter routine; remove dev->last_state)
> > was breaking suspend on laptops, as reported in the below link
> > - https://lkml.org/lkml/2011/11/11/164
> >
> > This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
> > (ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
> > by removing acpi_idle_suspend flag.
> > - https://lkml.org/lkml/2011/11/14/74
> >
> > But this fix did not work on all systems
> > as Suspend/resume regression was reported on Lenovo S10-3
> > recently by Dave.
> > - https://lkml.org/lkml/2012/5/27/115
> > It looked like with commit e978aa7d broke suspend and
> > with commit 3439a8da resume was not working with acpi_idle driver.
> >
> > This patch fixes the regression that caused this issue
> > in the first place. acpi_idle_suspend flag is essential on
> > some x86 systems to prevent the cpus from going to deeper C-states
> > when suspend is triggered ( commit b04e7bdb984 )
> > So reverting the commit 3439a8da is essential.
> >
> > By default, irqs are disabled in cpu_idle arch specific call
> > and re-enabled in idle state return path . As the acpi_idle_suspend
> > flag was being set during suspend, which prevented the cpus
> > going to deeper idle states, it is essential to
> > enabling the irqs in its return path too.
> >
> > To address the suspend issue,
> > we were not re-enabling the interrupts while returning from
> > acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
> > and this was causing suspend failure.
> >
> > In addition to the above fix, a sanity check has also been added
> > in x86 arch specific cpu_idle call to ensure that the idle call
> > always returns with IRQs enabled.
> >
> > Signed-off-by: Deepthi Dharwar <[email protected]>
> > Reported-and-Tested-by: Dav Hansen <[email protected]>
> > Tested-by: Preeti Murthy <[email protected]>
> > Reviewed-by: Srivatsa S Bhat <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 735279e..8ab76ad 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -460,6 +460,12 @@ void cpu_idle(void)
> > pm_idle();
> >
> > rcu_idle_exit();
> > +
> > + /*
> > + * Sanity check to ensure that idle call returns
> > + * with IRQs enabled
> > + */
> > + WARN_ON(irqs_disabled());
> > start_critical_timings();
> >
> > /* In many cases the interrupt that ended idle
> > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> > index f3decb3..c2ffd84 100644
> > --- a/drivers/acpi/processor_idle.c
> > +++ b/drivers/acpi/processor_idle.c
> > @@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
> > /*
> > * Suspend / resume control
> > */
> > +static int acpi_idle_suspend;
> > static u32 saved_bm_rld;
> >
> > static void acpi_idle_bm_rld_save(void)
> > @@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)
> >
> > int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
> > {
> > + if (acpi_idle_suspend == 1)
> > + return 0;
> > +
> > acpi_idle_bm_rld_save();
> > + acpi_idle_suspend = 1;
> > return 0;
> > }
> >
> > int acpi_processor_resume(struct acpi_device * device)
> > {
> > + if (acpi_idle_suspend == 0)
> > + return 0;
> > +
> > acpi_idle_bm_rld_restore();
> > + acpi_idle_suspend = 0;
> > return 0;
> > }
> >
> > @@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
> >
> > local_irq_disable();
> >
> > + if (acpi_idle_suspend) {
> > + local_irq_enable();
> > + cpu_relax();
> > + return -EINVAL;
> > + }
> > +
> > lapic_timer_state_broadcast(pr, cx, 1);
> > kt1 = ktime_get_real();
> > acpi_idle_do_entry(cx);
> > @@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
> >
> > local_irq_disable();
> >
> > + if (acpi_idle_suspend) {
> > + local_irq_enable();
> > + cpu_relax();
> > + return -EINVAL;
> > + }
> > +
> > if (cx->entry_method != ACPI_CSTATE_FFH) {
> > current_thread_info()->status &= ~TS_POLLING;
> > /*
> > @@ -901,6 +922,13 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
> > if (unlikely(!pr))
> > return -EINVAL;
> >
> > + if (acpi_idle_suspend) {
> > + if (irqs_disabled())
> > + local_irq_enable();
> > + cpu_relax();
> > + return -EINVAL;
> > + }
> > +
> > if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
> > if (drv->safe_state_index >= 0) {
> > return drv->states[drv->safe_state_index].enter(dev,
>

2012-06-24 22:12:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

On Friday, June 22, 2012, Fengguang Wu wrote:
> Hi Deepthi,
>
> I can no longer reproduce this issue... Sorry for the noise!

But I do, 100% of the time on real hardware which is Acer Ferrari One.

The symptom is that the new WARN_ON() triggers pretty much on every
cpuidle transition for me which makes the machine unuseable.

That may be a result of a different bug, but the box evidently works
without the hunk in arch/x86/kernel/process.c, so I'd prefer to drop it
for now.

Moreover, I have a couple of questions regarding the implementation.

First, why are we returning -EINVAL if acpi_idle_suspend is set? Surely,
something like -EBUSY or -EAGAIN would be better here, wouldn't it?

Second, is the cpu_relax() before returning really necessary?

Finally, the hunk in acpi_idle_enter_bm() still looks ugly. I'd prefer
something like this instead:

@@ -907,7 +928,10 @@ static int acpi_idle_enter_bm(struct cpu
drv, drv->safe_state_index);
} else {
local_irq_disable();
- acpi_safe_halt();
+
+ if (!acpi_idle_suspend)
+ acpi_safe_halt();
+
local_irq_enable();
return -EINVAL;
}
@@ -915,6 +939,12 @@ static int acpi_idle_enter_bm(struct cpu

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EBUSY;
+ }
+
if (cx->entry_method != ACPI_CSTATE_FFH) {
current_thread_info()->status &= ~TS_POLLING;
/*

Hmm?

The patch has been dropped from the linux-pm tree (hopefully for now).

Thanks,
Rafael


> On Fri, Jun 22, 2012 at 03:26:51PM +0530, Deepthi Dharwar wrote:
> > Hi,
> >
> > This is interesting .. suspend fix breaking boot on kvm.
> >
> > We complied and booted fine with mainline kernels including 3.5-rc2 with
> > this fix on servers and laptops.
> > So wondering how different is booting on machines Vs
> > from kvm ?
> >
> > Also, are you seeing smooth boot on kvm by reverting only this patch ?
> >
> > Cheers
> > Deepthi
> >
> > On 06/22/2012 01:05 PM, [email protected] wrote:
> >
> > > Hi Deepthi,
> > >
> > > It seems that commit de08622 triggers a boot hang in my kvm boot test.
> > >
> > > kvm -cpu kvm64 -enable-kvm \
> > > -kernel ./gpxelinux.lkrn \
> > > -initrd ./kvm.gpxe \
> > > -m 384M \
> > > -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=e1000 \
> > > -net user,vlan=0,hostfwd=tcp::$port-:22 \
> > > -boot order=nc \
> > > -no-reboot \
> > > -watchdog i6300esb \
> > > $(qemu_drives) \
> > > -pidfile $pidfile \
> > > -serial file:$dmesg
> > >
> > > tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> > > head: de08622731417ba429fc470ab8f2d7c260cb5c72
> > > commit: de08622731417ba429fc470ab8f2d7c260cb5c72 [9/9] PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
> > > config: x86_64-allyesdebian (attached as .config)
> > >
> > > x86_64-allyesdebian bbbbbbbbbB
> > > --
> > > 8: b 0eb8c06 2012-06-19 22:23:37 +0200 PM / Sleep: Separate printing suspend times from initcall_debug
> > > 9: B de08622 boot hang after [ 81.370898] usbcore: registered new interface driver dw2102
> > >
> > > legend: [b] boot fine [B] boot error
> > >
> > > commit de08622731417ba429fc470ab8f2d7c260cb5c72
> > > Author: Deepthi Dharwar <[email protected]>
> > > Date: Wed Jun 13 16:28:17 2012 +0530
> > >
> > > PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
> > >
> > > Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
> > > dev->last_residency update to driver enter routine; remove dev->last_state)
> > > was breaking suspend on laptops, as reported in the below link
> > > - https://lkml.org/lkml/2011/11/11/164
> > >
> > > This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
> > > (ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
> > > by removing acpi_idle_suspend flag.
> > > - https://lkml.org/lkml/2011/11/14/74
> > >
> > > But this fix did not work on all systems
> > > as Suspend/resume regression was reported on Lenovo S10-3
> > > recently by Dave.
> > > - https://lkml.org/lkml/2012/5/27/115
> > > It looked like with commit e978aa7d broke suspend and
> > > with commit 3439a8da resume was not working with acpi_idle driver.
> > >
> > > This patch fixes the regression that caused this issue
> > > in the first place. acpi_idle_suspend flag is essential on
> > > some x86 systems to prevent the cpus from going to deeper C-states
> > > when suspend is triggered ( commit b04e7bdb984 )
> > > So reverting the commit 3439a8da is essential.
> > >
> > > By default, irqs are disabled in cpu_idle arch specific call
> > > and re-enabled in idle state return path . As the acpi_idle_suspend
> > > flag was being set during suspend, which prevented the cpus
> > > going to deeper idle states, it is essential to
> > > enabling the irqs in its return path too.
> > >
> > > To address the suspend issue,
> > > we were not re-enabling the interrupts while returning from
> > > acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
> > > and this was causing suspend failure.
> > >
> > > In addition to the above fix, a sanity check has also been added
> > > in x86 arch specific cpu_idle call to ensure that the idle call
> > > always returns with IRQs enabled.
> > >
> > > Signed-off-by: Deepthi Dharwar <[email protected]>
> > > Reported-and-Tested-by: Dav Hansen <[email protected]>
> > > Tested-by: Preeti Murthy <[email protected]>
> > > Reviewed-by: Srivatsa S Bhat <[email protected]>
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > >
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 735279e..8ab76ad 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -460,6 +460,12 @@ void cpu_idle(void)
> > > pm_idle();
> > >
> > > rcu_idle_exit();
> > > +
> > > + /*
> > > + * Sanity check to ensure that idle call returns
> > > + * with IRQs enabled
> > > + */
> > > + WARN_ON(irqs_disabled());
> > > start_critical_timings();
> > >
> > > /* In many cases the interrupt that ended idle
> > > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> > > index f3decb3..c2ffd84 100644
> > > --- a/drivers/acpi/processor_idle.c
> > > +++ b/drivers/acpi/processor_idle.c
> > > @@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
> > > /*
> > > * Suspend / resume control
> > > */
> > > +static int acpi_idle_suspend;
> > > static u32 saved_bm_rld;
> > >
> > > static void acpi_idle_bm_rld_save(void)
> > > @@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)
> > >
> > > int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
> > > {
> > > + if (acpi_idle_suspend == 1)
> > > + return 0;
> > > +
> > > acpi_idle_bm_rld_save();
> > > + acpi_idle_suspend = 1;
> > > return 0;
> > > }
> > >
> > > int acpi_processor_resume(struct acpi_device * device)
> > > {
> > > + if (acpi_idle_suspend == 0)
> > > + return 0;
> > > +
> > > acpi_idle_bm_rld_restore();
> > > + acpi_idle_suspend = 0;
> > > return 0;
> > > }
> > >
> > > @@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
> > >
> > > local_irq_disable();
> > >
> > > + if (acpi_idle_suspend) {
> > > + local_irq_enable();
> > > + cpu_relax();
> > > + return -EINVAL;
> > > + }
> > > +
> > > lapic_timer_state_broadcast(pr, cx, 1);
> > > kt1 = ktime_get_real();
> > > acpi_idle_do_entry(cx);
> > > @@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
> > >
> > > local_irq_disable();
> > >
> > > + if (acpi_idle_suspend) {
> > > + local_irq_enable();
> > > + cpu_relax();
> > > + return -EINVAL;
> > > + }
> > > +
> > > if (cx->entry_method != ACPI_CSTATE_FFH) {
> > > current_thread_info()->status &= ~TS_POLLING;
> > > /*
> > > @@ -901,6 +922,13 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
> > > if (unlikely(!pr))
> > > return -EINVAL;
> > >
> > > + if (acpi_idle_suspend) {
> > > + if (irqs_disabled())
> > > + local_irq_enable();
> > > + cpu_relax();
> > > + return -EINVAL;
> > > + }
> > > +
> > > if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
> > > if (drv->safe_state_index >= 0) {
> > > return drv->states[drv->safe_state_index].enter(dev,
> >
>
>

2012-06-25 12:31:24

by Deepthi Dharwar

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

On 06/25/2012 03:48 AM, Rafael J. Wysocki wrote:

> On Friday, June 22, 2012, Fengguang Wu wrote:
>> Hi Deepthi,
>>
>> I can no longer reproduce this issue... Sorry for the noise!
>
> But I do, 100% of the time on real hardware which is Acer Ferrari One.
>
> The symptom is that the new WARN_ON() triggers pretty much on every
> cpuidle transition for me which makes the machine unuseable.
>
> That may be a result of a different bug, but the box evidently works
> without the hunk in arch/x86/kernel/process.c, so I'd prefer to drop it
> for now.

If cpu is coming out of cpuidle with WARN_ON every-time, is suspend
on this device working ? Wanted to get some clarity about this hardware .
We can remove this check for now.

> Moreover, I have a couple of questions regarding the implementation.
>
> First, why are we returning -EINVAL if acpi_idle_suspend is set? Surely,
> something like -EBUSY or -EAGAIN would be better here, wouldn't it?
>


Any negative return value will serve the purpose. We put it as -EINVAL
as cpu_idle operation during suspend was termed invalid. I will change it to
-EBUSY for better readability.

> Second, is the cpu_relax() before returning really necessary?


cpu_relax operation is mainly to lower CPU power consumption.
If the system is idle, we would want to at-least reduce
power consumption. Better than busy looping in and out of idle when
suspend is triggered. There is no harm having it.

>
> Finally, the hunk in acpi_idle_enter_bm() still looks ugly. I'd prefer
> something like this instead:
>
> @@ -907,7 +928,10 @@ static int acpi_idle_enter_bm(struct cpu
> drv, drv->safe_state_index);
> } else {
> local_irq_disable();
> - acpi_safe_halt();
> +
> + if (!acpi_idle_suspend)
> + acpi_safe_halt();
> +
> local_irq_enable();
> return -EINVAL;
> }
> @@ -915,6 +939,12 @@ static int acpi_idle_enter_bm(struct cpu
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EBUSY;
> + }
> +
> if (cx->entry_method != ACPI_CSTATE_FFH) {
> current_thread_info()->status &= ~TS_POLLING;
> /*
>
> Hmm?


Yes, I'll will re-factor the code.


> The patch has been dropped from the linux-pm tree (hopefully for now).
>
> Thanks,
> Rafael
>
>
>> On Fri, Jun 22, 2012 at 03:26:51PM +0530, Deepthi Dharwar wrote:
>>> Hi,
>>>
>>> This is interesting .. suspend fix breaking boot on kvm.
>>>
>>> We complied and booted fine with mainline kernels including 3.5-rc2 with
>>> this fix on servers and laptops.
>>> So wondering how different is booting on machines Vs
>>> from kvm ?
>>>
>>> Also, are you seeing smooth boot on kvm by reverting only this patch ?
>>>
>>> Cheers
>>> Deepthi
>>>
>>> On 06/22/2012 01:05 PM, [email protected] wrote:
>>>
>>>> Hi Deepthi,
>>>>
>>>> It seems that commit de08622 triggers a boot hang in my kvm boot test.
>>>>
>>>> kvm -cpu kvm64 -enable-kvm \
>>>> -kernel ./gpxelinux.lkrn \
>>>> -initrd ./kvm.gpxe \
>>>> -m 384M \
>>>> -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=e1000 \
>>>> -net user,vlan=0,hostfwd=tcp::$port-:22 \
>>>> -boot order=nc \
>>>> -no-reboot \
>>>> -watchdog i6300esb \
>>>> $(qemu_drives) \
>>>> -pidfile $pidfile \
>>>> -serial file:$dmesg
>>>>
>>>> tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
>>>> head: de08622731417ba429fc470ab8f2d7c260cb5c72
>>>> commit: de08622731417ba429fc470ab8f2d7c260cb5c72 [9/9] PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
>>>> config: x86_64-allyesdebian (attached as .config)
>>>>
>>>> x86_64-allyesdebian bbbbbbbbbB
>>>> --
>>>> 8: b 0eb8c06 2012-06-19 22:23:37 +0200 PM / Sleep: Separate printing suspend times from initcall_debug
>>>> 9: B de08622 boot hang after [ 81.370898] usbcore: registered new interface driver dw2102
>>>>
>>>> legend: [b] boot fine [B] boot error
>>>>
>>>> commit de08622731417ba429fc470ab8f2d7c260cb5c72
>>>> Author: Deepthi Dharwar <[email protected]>
>>>> Date: Wed Jun 13 16:28:17 2012 +0530
>>>>
>>>> PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup.
>>>>
>>>> Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
>>>> dev->last_residency update to driver enter routine; remove dev->last_state)
>>>> was breaking suspend on laptops, as reported in the below link
>>>> - https://lkml.org/lkml/2011/11/11/164
>>>>
>>>> This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
>>>> (ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
>>>> by removing acpi_idle_suspend flag.
>>>> - https://lkml.org/lkml/2011/11/14/74
>>>>
>>>> But this fix did not work on all systems
>>>> as Suspend/resume regression was reported on Lenovo S10-3
>>>> recently by Dave.
>>>> - https://lkml.org/lkml/2012/5/27/115
>>>> It looked like with commit e978aa7d broke suspend and
>>>> with commit 3439a8da resume was not working with acpi_idle driver.
>>>>
>>>> This patch fixes the regression that caused this issue
>>>> in the first place. acpi_idle_suspend flag is essential on
>>>> some x86 systems to prevent the cpus from going to deeper C-states
>>>> when suspend is triggered ( commit b04e7bdb984 )
>>>> So reverting the commit 3439a8da is essential.
>>>>
>>>> By default, irqs are disabled in cpu_idle arch specific call
>>>> and re-enabled in idle state return path . As the acpi_idle_suspend
>>>> flag was being set during suspend, which prevented the cpus
>>>> going to deeper idle states, it is essential to
>>>> enabling the irqs in its return path too.
>>>>
>>>> To address the suspend issue,
>>>> we were not re-enabling the interrupts while returning from
>>>> acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
>>>> and this was causing suspend failure.
>>>>
>>>> In addition to the above fix, a sanity check has also been added
>>>> in x86 arch specific cpu_idle call to ensure that the idle call
>>>> always returns with IRQs enabled.
>>>>
>>>> Signed-off-by: Deepthi Dharwar <[email protected]>
>>>> Reported-and-Tested-by: Dav Hansen <[email protected]>
>>>> Tested-by: Preeti Murthy <[email protected]>
>>>> Reviewed-by: Srivatsa S Bhat <[email protected]>
>>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>>>
>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>> index 735279e..8ab76ad 100644
>>>> --- a/arch/x86/kernel/process.c
>>>> +++ b/arch/x86/kernel/process.c
>>>> @@ -460,6 +460,12 @@ void cpu_idle(void)
>>>> pm_idle();
>>>>
>>>> rcu_idle_exit();
>>>> +
>>>> + /*
>>>> + * Sanity check to ensure that idle call returns
>>>> + * with IRQs enabled
>>>> + */
>>>> + WARN_ON(irqs_disabled());
>>>> start_critical_timings();
>>>>
>>>> /* In many cases the interrupt that ended idle
>>>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>>>> index f3decb3..c2ffd84 100644
>>>> --- a/drivers/acpi/processor_idle.c
>>>> +++ b/drivers/acpi/processor_idle.c
>>>> @@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
>>>> /*
>>>> * Suspend / resume control
>>>> */
>>>> +static int acpi_idle_suspend;
>>>> static u32 saved_bm_rld;
>>>>
>>>> static void acpi_idle_bm_rld_save(void)
>>>> @@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)
>>>>
>>>> int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
>>>> {
>>>> + if (acpi_idle_suspend == 1)
>>>> + return 0;
>>>> +
>>>> acpi_idle_bm_rld_save();
>>>> + acpi_idle_suspend = 1;
>>>> return 0;
>>>> }
>>>>
>>>> int acpi_processor_resume(struct acpi_device * device)
>>>> {
>>>> + if (acpi_idle_suspend == 0)
>>>> + return 0;
>>>> +
>>>> acpi_idle_bm_rld_restore();
>>>> + acpi_idle_suspend = 0;
>>>> return 0;
>>>> }
>>>>
>>>> @@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>>>>
>>>> local_irq_disable();
>>>>
>>>> + if (acpi_idle_suspend) {
>>>> + local_irq_enable();
>>>> + cpu_relax();
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> lapic_timer_state_broadcast(pr, cx, 1);
>>>> kt1 = ktime_get_real();
>>>> acpi_idle_do_entry(cx);
>>>> @@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>>>>
>>>> local_irq_disable();
>>>>
>>>> + if (acpi_idle_suspend) {
>>>> + local_irq_enable();
>>>> + cpu_relax();
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> if (cx->entry_method != ACPI_CSTATE_FFH) {
>>>> current_thread_info()->status &= ~TS_POLLING;
>>>> /*
>>>> @@ -901,6 +922,13 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>>>> if (unlikely(!pr))
>>>> return -EINVAL;
>>>>
>>>> + if (acpi_idle_suspend) {
>>>> + if (irqs_disabled())
>>>> + local_irq_enable();
>>>> + cpu_relax();
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
>>>> if (drv->safe_state_index >= 0) {
>>>> return drv->states[drv->safe_state_index].enter(dev,
>>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


Cheers,
Deepthi

2012-06-25 12:38:18

by Deepthi Dharwar

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

From: Deepthi Dharwar <[email protected]>

Fix suspend/resume regression caused by cpuidle cleanup.

Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
dev->last_residency update to driver enter routine; remove dev->last_state)
was breaking suspend on laptops, as reported in the below link
- https://lkml.org/lkml/2011/11/11/164

This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
(ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
by removing acpi_idle_suspend flag.
- https://lkml.org/lkml/2011/11/14/74

But this did fix did not work on all systems
as Suspend/resume regression was reported on Lenovo S10-3
recently by Dave.
- https://lkml.org/lkml/2012/5/27/115
It looked like with commit e978aa7d broke suspend and
with commit 3439a8da resume was not working with acpi_idle driver.

This patch fixes the regression that caused this issue
in the first place. acpi_idle_suspend flag is essential on
some x86 systems to prevent the cpus from going to deeper C-states
when suspend is triggered ( commit b04e7bdb984 )
So reverting the commit 3439a8da is essential.

By default, irqs are disabled in cpu_idle arch specific call
and re-enabled in idle state return path . During suspend,
the acpi_idle_suspend flag is set, which
prevents the cpus from going to deeper idle states,
it is essential to enabling the irqs in this return path too.

To address the suspend issue,
we were not re-enabling the interrupts while returning from
acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
and this caused suspend failure.

In addition to the above, to improve the readability of the code,
return of -ENIVAL is replaced with -EBUSY in acpi_idle_suspend
return path. Implying that the system is currently busy when suspend
is in progress, which prevents the cpus from entering deeper C-states.

---
This patch applies on 3.5-rc2

Reported-and-Tested-by: Dav Hansen <[email protected]>
Tested-by: Preeti Murthy <[email protected]>
Signed-off-by: Deepthi Dharwar <[email protected]>
Reviewed-by: Srivatsa S Bhat <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/acpi/processor_idle.c | 32 ++++++++++++++++++++++++++++++--
1 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index f3decb3..47a8caa 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
/*
* Suspend / resume control
*/
+static int acpi_idle_suspend;
static u32 saved_bm_rld;

static void acpi_idle_bm_rld_save(void)
@@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)

int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
{
+ if (acpi_idle_suspend == 1)
+ return 0;
+
acpi_idle_bm_rld_save();
+ acpi_idle_suspend = 1;
return 0;
}

int acpi_processor_resume(struct acpi_device * device)
{
+ if (acpi_idle_suspend == 0)
+ return 0;
+
acpi_idle_bm_rld_restore();
+ acpi_idle_suspend = 0;
return 0;
}

@@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EBUSY;
+ }
+
lapic_timer_state_broadcast(pr, cx, 1);
kt1 = ktime_get_real();
acpi_idle_do_entry(cx);
@@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EBUSY;
+ }
+
if (cx->entry_method != ACPI_CSTATE_FFH) {
current_thread_info()->status &= ~TS_POLLING;
/*
@@ -907,14 +928,21 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
drv, drv->safe_state_index);
} else {
local_irq_disable();
- acpi_safe_halt();
+ if (!acpi_idle_suspend)
+ acpi_safe_halt();
local_irq_enable();
- return -EINVAL;
+ return -EBUSY;
}
}

local_irq_disable();

+ if (acpi_idle_suspend) {
+ local_irq_enable();
+ cpu_relax();
+ return -EBUSY;
+ }
+
if (cx->entry_method != ACPI_CSTATE_FFH) {
current_thread_info()->status &= ~TS_POLLING;
/*

Cheers,
Deepthi

2012-06-25 20:31:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

On Monday, June 25, 2012, Deepthi Dharwar wrote:
> On 06/25/2012 03:48 AM, Rafael J. Wysocki wrote:
>
> > On Friday, June 22, 2012, Fengguang Wu wrote:
> >> Hi Deepthi,
> >>
> >> I can no longer reproduce this issue... Sorry for the noise!
> >
> > But I do, 100% of the time on real hardware which is Acer Ferrari One.
> >
> > The symptom is that the new WARN_ON() triggers pretty much on every
> > cpuidle transition for me which makes the machine unuseable.
> >
> > That may be a result of a different bug, but the box evidently works
> > without the hunk in arch/x86/kernel/process.c, so I'd prefer to drop it
> > for now.
>
> If cpu is coming out of cpuidle with WARN_ON every-time, is suspend
> on this device working ?

Yes, it is.

> Wanted to get some clarity about this hardware .
> We can remove this check for now.

OK

> > Moreover, I have a couple of questions regarding the implementation.
> >
> > First, why are we returning -EINVAL if acpi_idle_suspend is set? Surely,
> > something like -EBUSY or -EAGAIN would be better here, wouldn't it?
> >
>
>
> Any negative return value will serve the purpose. We put it as -EINVAL
> as cpu_idle operation during suspend was termed invalid. I will change it to
> -EBUSY for better readability.

OK

> > Second, is the cpu_relax() before returning really necessary?
>
>
> cpu_relax operation is mainly to lower CPU power consumption.
> If the system is idle, we would want to at-least reduce
> power consumption. Better than busy looping in and out of idle when
> suspend is triggered. There is no harm having it.

It seems kind of redundant, though. The system is suspending at this point,
so the increased power consumption, if any, shouldn't be that much anyway,
right?

> > Finally, the hunk in acpi_idle_enter_bm() still looks ugly. I'd prefer
> > something like this instead:
> >
> > @@ -907,7 +928,10 @@ static int acpi_idle_enter_bm(struct cpu
> > drv, drv->safe_state_index);
> > } else {
> > local_irq_disable();
> > - acpi_safe_halt();
> > +
> > + if (!acpi_idle_suspend)
> > + acpi_safe_halt();
> > +
> > local_irq_enable();
> > return -EINVAL;
> > }
> > @@ -915,6 +939,12 @@ static int acpi_idle_enter_bm(struct cpu
> >
> > local_irq_disable();
> >
> > + if (acpi_idle_suspend) {
> > + local_irq_enable();
> > + cpu_relax();
> > + return -EBUSY;
> > + }
> > +
> > if (cx->entry_method != ACPI_CSTATE_FFH) {
> > current_thread_info()->status &= ~TS_POLLING;
> > /*
> >
> > Hmm?
>
>
> Yes, I'll will re-factor the code.

Good.

Thanks,
Rafael

2012-06-25 20:39:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: boot hang on commit "PM / ACPI: Fix suspend/resume regression caused by cpuidle cleanup."

On Monday, June 25, 2012, Deepthi Dharwar wrote:
> From: Deepthi Dharwar <[email protected]>
>
> Fix suspend/resume regression caused by cpuidle cleanup.
>
> Commit e978aa7d7d57d04eb5f88a7507c4fb98577def77 ( cpuidle: Move
> dev->last_residency update to driver enter routine; remove dev->last_state)
> was breaking suspend on laptops, as reported in the below link
> - https://lkml.org/lkml/2011/11/11/164
>
> This was fixed in commit 3439a8da16bcad6b0982ece938c9f8299bb53584
> (ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
> by removing acpi_idle_suspend flag.
> - https://lkml.org/lkml/2011/11/14/74
>
> But this did fix did not work on all systems
> as Suspend/resume regression was reported on Lenovo S10-3
> recently by Dave.
> - https://lkml.org/lkml/2012/5/27/115
> It looked like with commit e978aa7d broke suspend and
> with commit 3439a8da resume was not working with acpi_idle driver.
>
> This patch fixes the regression that caused this issue
> in the first place. acpi_idle_suspend flag is essential on
> some x86 systems to prevent the cpus from going to deeper C-states
> when suspend is triggered ( commit b04e7bdb984 )
> So reverting the commit 3439a8da is essential.
>
> By default, irqs are disabled in cpu_idle arch specific call
> and re-enabled in idle state return path . During suspend,
> the acpi_idle_suspend flag is set, which
> prevents the cpus from going to deeper idle states,
> it is essential to enabling the irqs in this return path too.
>
> To address the suspend issue,
> we were not re-enabling the interrupts while returning from
> acpi_idle_enter_bm() routine if acpi_idle_suspend flag is set.
> and this caused suspend failure.
>
> In addition to the above, to improve the readability of the code,
> return of -ENIVAL is replaced with -EBUSY in acpi_idle_suspend
> return path. Implying that the system is currently busy when suspend
> is in progress, which prevents the cpus from entering deeper C-states.
>
> ---
> This patch applies on 3.5-rc2
>
> Reported-and-Tested-by: Dav Hansen <[email protected]>
> Tested-by: Preeti Murthy <[email protected]>
> Signed-off-by: Deepthi Dharwar <[email protected]>
> Reviewed-by: Srivatsa S Bhat <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

Looking good. I'll give it a bit of testing and put it into linux-next for
a day or two to make sure nobody sees any more problems with it.

Thanks,
Rafael


> ---
> drivers/acpi/processor_idle.c | 32 ++++++++++++++++++++++++++++++--
> 1 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index f3decb3..47a8caa 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -224,6 +224,7 @@ static void lapic_timer_state_broadcast(struct acpi_processor *pr,
> /*
> * Suspend / resume control
> */
> +static int acpi_idle_suspend;
> static u32 saved_bm_rld;
>
> static void acpi_idle_bm_rld_save(void)
> @@ -242,13 +243,21 @@ static void acpi_idle_bm_rld_restore(void)
>
> int acpi_processor_suspend(struct acpi_device * device, pm_message_t state)
> {
> + if (acpi_idle_suspend == 1)
> + return 0;
> +
> acpi_idle_bm_rld_save();
> + acpi_idle_suspend = 1;
> return 0;
> }
>
> int acpi_processor_resume(struct acpi_device * device)
> {
> + if (acpi_idle_suspend == 0)
> + return 0;
> +
> acpi_idle_bm_rld_restore();
> + acpi_idle_suspend = 0;
> return 0;
> }
>
> @@ -754,6 +763,12 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EBUSY;
> + }
> +
> lapic_timer_state_broadcast(pr, cx, 1);
> kt1 = ktime_get_real();
> acpi_idle_do_entry(cx);
> @@ -823,6 +838,12 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EBUSY;
> + }
> +
> if (cx->entry_method != ACPI_CSTATE_FFH) {
> current_thread_info()->status &= ~TS_POLLING;
> /*
> @@ -907,14 +928,21 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
> drv, drv->safe_state_index);
> } else {
> local_irq_disable();
> - acpi_safe_halt();
> + if (!acpi_idle_suspend)
> + acpi_safe_halt();
> local_irq_enable();
> - return -EINVAL;
> + return -EBUSY;
> }
> }
>
> local_irq_disable();
>
> + if (acpi_idle_suspend) {
> + local_irq_enable();
> + cpu_relax();
> + return -EBUSY;
> + }
> +
> if (cx->entry_method != ACPI_CSTATE_FFH) {
> current_thread_info()->status &= ~TS_POLLING;
> /*
>
> Cheers,
> Deepthi
>
>
>