2017-07-24 13:46:17

by Martin Peres

[permalink] [raw]
Subject: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hello,

I am contacting you because your patch "bf22ff45b genirq: Avoid
unnecessary low level irq function calls" broke suspend/resume on an old
Intel platform.

This was not caught in linux-next and only got introduced in linux 4.13-rc1.

Here is the current place where the bug has been reported[0] but I can
move it to bugzilla.kernel.org if you would want that.

Here are the kernel logs[1]. What information would you need?

Thanks in advance,
Martin

[0] https://bugs.freedesktop.org/show_bug.cgi?id=101838
[1]
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2845/fi-elk-e7500/dmesg-before.log


2017-07-24 15:28:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 24 Jul 2017, Martin Peres wrote:
>
> I am contacting you because your patch "bf22ff45b genirq: Avoid unnecessary
> low level irq function calls" broke suspend/resume on an old Intel platform.
>
> This was not caught in linux-next and only got introduced in linux 4.13-rc1.
>
> Here is the current place where the bug has been reported[0] but I can move it
> to bugzilla.kernel.org if you would want that.

Can we please handle this per mail?

> Here are the kernel logs[1]. What information would you need?

Output of 'cat /proc/interrupts' and a description what kind of 'old' Intel
platform that is.

Thanks,

tglx






2017-07-24 15:42:10

by Martin Peres

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 24/07/17 18:28, Thomas Gleixner wrote:
> On Mon, 24 Jul 2017, Martin Peres wrote:
>>
>> I am contacting you because your patch "bf22ff45b genirq: Avoid unnecessary
>> low level irq function calls" broke suspend/resume on an old Intel platform.
>>
>> This was not caught in linux-next and only got introduced in linux 4.13-rc1.
>>
>> Here is the current place where the bug has been reported[0] but I can move it
>> to bugzilla.kernel.org if you would want that.
>
> Can we please handle this per mail?

Sure, if you prefer.

>
>> Here are the kernel logs[1]. What information would you need?
>
> Output of 'cat /proc/interrupts' and a description what kind of 'old' Intel
> platform that is.

Sorry, I should have repeated the name outside of just the subject of
the email. It is an Intel Eagle Lake Core2Duo.

$ sudo cat /proc/interrupts
CPU0 CPU1


0: 182 0 IO-APIC 2-edge timer


1: 2 1 IO-APIC 1-edge i8042


4: 6 4 IO-APIC 4-edge serial


5: 0 0 IO-APIC 5-edge ehci_hcd:usb1,
uhci_hcd:usb5, uhci_hcd:usb8

8: 1 0 IO-APIC 8-edge rtc0


9: 0 0 IO-APIC 9-fasteoi acpi


10: 0 0 IO-APIC 10-edge ehci_hcd:usb2,
uhci_hcd:usb3, uhci_hcd:usb6

11: 0 0 IO-APIC 11-edge uhci_hcd:usb4,
uhci_hcd:usb7

12: 3 2 IO-APIC 12-edge i8042


24: 63345 0 HPET-MSI 2-edge hpet2


25: 0 70778 HPET-MSI 3-edge hpet3


28: 0 0 PCI-MSI 458752-edge PCIe PME


29: 0 0 PCI-MSI 466944-edge PCIe PME


30: 5349 10818 PCI-MSI 512000-edge
ahci[0000:00:1f.2]

31: 810 10234 PCI-MSI 409600-edge enp0s25


32: 9 9 PCI-MSI 49152-edge mei_me


33: 233 223 PCI-MSI 442368-edge
snd_hda_intel:card0

34: 1 0 PCI-MSI 32768-edge i915


NMI: 38 33 Non-maskable interrupts


LOC: 710 438 Local timer interrupts


SPU: 0 0 Spurious interrupts


PMI: 38 33 Performance monitoring interrupts


IWI: 0 0 IRQ work interrupts


RTR: 0 0 APIC ICR read retries


RES: 2629 2425 Rescheduling interrupts


CAL: 618 722 Function call interrupts


TLB: 321 309 TLB shootdowns


TRM: 0 0 Thermal event interrupts


THR: 0 0 Threshold APIC interrupts


MCE: 0 0 Machine check exceptions


MCP: 7 7 Machine check polls


ERR: 0


MIS: 0


PIN: 0 0 Posted-interrupt notification event


PIW: 0 0 Posted-interrupt wakeup event

Cheers,
Martin

2017-07-24 16:36:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 24 Jul 2017, Martin Peres wrote:
> On 24/07/17 18:28, Thomas Gleixner wrote:
> > Output of 'cat /proc/interrupts' and a description what kind of 'old' Intel
> > platform that is.
>
> Sorry, I should have repeated the name outside of just the subject of the
> email. It is an Intel Eagle Lake Core2Duo.

I hoped you would give me a little bit more details, but I Gurgled it
now. I try to find such a beast and look what's wrong there. If I can't
find one, I'll come back with some debug patches.

Thanks,

tglx

2017-07-24 16:38:11

by Martin Peres

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 24/07/17 19:35, Thomas Gleixner wrote:
> On Mon, 24 Jul 2017, Martin Peres wrote:
>> On 24/07/17 18:28, Thomas Gleixner wrote:
>>> Output of 'cat /proc/interrupts' and a description what kind of 'old' Intel
>>> platform that is.
>>
>> Sorry, I should have repeated the name outside of just the subject of the
>> email. It is an Intel Eagle Lake Core2Duo.
>
> I hoped you would give me a little bit more details, but I Gurgled it
> now. I try to find such a beast and look what's wrong there. If I can't
> find one, I'll come back with some debug patches.

Ha, sorry!

Here is all the info I have:
- Hardware: HP Compaq 8000
- CPU: Eagle Lake Core2Duo E7500

I hope this helps!

Martin

2017-07-25 07:03:10

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 24/07/17 19:37, Martin Peres wrote:
> On 24/07/17 19:35, Thomas Gleixner wrote:
>> On Mon, 24 Jul 2017, Martin Peres wrote:
>>> On 24/07/17 18:28, Thomas Gleixner wrote:
>>>> Output of 'cat /proc/interrupts' and a description what kind of
>>>> 'old' Intel
>>>> platform that is.
>>>
>>> Sorry, I should have repeated the name outside of just the subject of
>>> the
>>> email. It is an Intel Eagle Lake Core2Duo.
>>
>> I hoped you would give me a little bit more details, but I Gurgled it
>> now. I try to find such a beast and look what's wrong there. If I can't
>> find one, I'll come back with some debug patches.
>
> Ha, sorry!
>
> Here is all the info I have:
> - Hardware: HP Compaq 8000
> - CPU: Eagle Lake Core2Duo E7500

As mentioned, it's HP Compaq 8000 Elite, desktop tower in most basic
configuration: one SATA SSD, 4GB memory and so on. Most recent BIOS from
around 2012, no UEFI.

If you have proposed fix for the issue, I'll be happy to test them.

Best regards,

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-26 13:47:53

by Martin Peres

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo



On 25/07/17 10:01, Tomi Sarvela wrote:
> On 24/07/17 19:37, Martin Peres wrote:
>> On 24/07/17 19:35, Thomas Gleixner wrote:
>>> On Mon, 24 Jul 2017, Martin Peres wrote:
>>>> On 24/07/17 18:28, Thomas Gleixner wrote:
>>>>> Output of 'cat /proc/interrupts' and a description what kind of
>>>>> 'old' Intel
>>>>> platform that is.
>>>>
>>>> Sorry, I should have repeated the name outside of just the subject
>>>> of the
>>>> email. It is an Intel Eagle Lake Core2Duo.
>>>
>>> I hoped you would give me a little bit more details, but I Gurgled it
>>> now. I try to find such a beast and look what's wrong there. If I can't
>>> find one, I'll come back with some debug patches.
>>
>> Ha, sorry!
>>
>> Here is all the info I have:
>> - Hardware: HP Compaq 8000
>> - CPU: Eagle Lake Core2Duo E7500
>
> As mentioned, it's HP Compaq 8000 Elite, desktop tower in most basic
> configuration: one SATA SSD, 4GB memory and so on. Most recent BIOS from
> around 2012, no UEFI.
>
> If you have proposed fix for the issue, I'll be happy to test them.
>
> Best regards,

Thanks for the information Tomi.

Thomas, any update on this? Another user reported the problem:

"I have the same thing on an old cheap laptop.
model name : Intel(R) Core(TM)2 CPU
cpu family : 6
model : 15

Previously, this never happened on the kernel series 4.{10,11,12}.
After reverted commit bf22ff45bed664 on 4.13-rc2, unfortunately it still
hangs when it wakes up.
Difficult to diagnose because it does not happen right away. After some
time of action."

I hope this helps!

Cheers,
Martin

2017-07-26 14:26:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Wed, 26 Jul 2017, Martin Peres wrote:
> On 25/07/17 10:01, Tomi Sarvela wrote:
> > On 24/07/17 19:37, Martin Peres wrote:
> > > On 24/07/17 19:35, Thomas Gleixner wrote:
> > > > On Mon, 24 Jul 2017, Martin Peres wrote:
> > > > > On 24/07/17 18:28, Thomas Gleixner wrote:
> > > > > > Output of 'cat /proc/interrupts' and a description what kind of
> > > > > > 'old' Intel
> > > > > > platform that is.
> > > > >
> > > > > Sorry, I should have repeated the name outside of just the subject of
> > > > > the
> > > > > email. It is an Intel Eagle Lake Core2Duo.
> > > >
> > > > I hoped you would give me a little bit more details, but I Gurgled it
> > > > now. I try to find such a beast and look what's wrong there. If I can't
> > > > find one, I'll come back with some debug patches.
> > >
> > > Ha, sorry!
> > >
> > > Here is all the info I have:
> > > - Hardware: HP Compaq 8000
> > > - CPU: Eagle Lake Core2Duo E7500
> >
> > As mentioned, it's HP Compaq 8000 Elite, desktop tower in most basic
> > configuration: one SATA SSD, 4GB memory and so on. Most recent BIOS from
> > around 2012, no UEFI.
> >
> > If you have proposed fix for the issue, I'll be happy to test them.
> >
> > Best regards,
>
> Thanks for the information Tomi.
>
> Thomas, any update on this? Another user reported the problem:

Not yet.

> "I have the same thing on an old cheap laptop.
> model name : Intel(R) Core(TM)2 CPU
> cpu family : 6
> model : 15
>
> Previously, this never happened on the kernel series 4.{10,11,12}.
> After reverted commit bf22ff45bed664 on 4.13-rc2, unfortunately it still hangs
> when it wakes up.

So reverting that commit does not help. Does it help on your machine?

Thanks,

tglx

2017-07-27 07:38:01

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 26/07/17 17:26, Thomas Gleixner wrote:
> On Wed, 26 Jul 2017, Martin Peres wrote:
>> On 25/07/17 10:01, Tomi Sarvela wrote:
>>> On 24/07/17 19:37, Martin Peres wrote:
>>>> On 24/07/17 19:35, Thomas Gleixner wrote:
>>>>> On Mon, 24 Jul 2017, Martin Peres wrote:
>>>>>> On 24/07/17 18:28, Thomas Gleixner wrote:
>>>>>>> Output of 'cat /proc/interrupts' and a description what kind of
>>>>>>> 'old' Intel
>>>>>>> platform that is.
>>>>>>
>>>>>> Sorry, I should have repeated the name outside of just the subject of
>>>>>> the
>>>>>> email. It is an Intel Eagle Lake Core2Duo.
>>>>>
>>>>> I hoped you would give me a little bit more details, but I Gurgled it
>>>>> now. I try to find such a beast and look what's wrong there. If I can't
>>>>> find one, I'll come back with some debug patches.
>>>>
>>>> Ha, sorry!
>>>>
>>>> Here is all the info I have:
>>>> - Hardware: HP Compaq 8000
>>>> - CPU: Eagle Lake Core2Duo E7500
>>>
>>> As mentioned, it's HP Compaq 8000 Elite, desktop tower in most basic
>>> configuration: one SATA SSD, 4GB memory and so on. Most recent BIOS from
>>> around 2012, no UEFI.
>>>
>>> If you have proposed fix for the issue, I'll be happy to test them.
>>>
>>> Best regards,
>>
>> Thanks for the information Tomi.
>>
>> Thomas, any update on this? Another user reported the problem:
>
> Not yet.
>
>> "I have the same thing on an old cheap laptop.
>> model name : Intel(R) Core(TM)2 CPU
>> cpu family : 6
>> model : 15
>>
>> Previously, this never happened on the kernel series 4.{10,11,12}.
>> After reverted commit bf22ff45bed664 on 4.13-rc2, unfortunately it still hangs
>> when it wakes up.
>
> So reverting that commit does not help. Does it help on your machine?

Yes. Reverting it does not cause the machine to lock up on resume.

I haven't tested if the machine locks up later on, but at least it
survives couple of s/r cycles.

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-27 07:42:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Thu, 27 Jul 2017, Tomi Sarvela wrote:
> On 26/07/17 17:26, Thomas Gleixner wrote:
> > So reverting that commit does not help. Does it help on your machine?
>
> Yes. Reverting it does not cause the machine to lock up on resume.
>
> I haven't tested if the machine locks up later on, but at least it survives
> couple of s/r cycles.

Can you please try to add 'nohpet' to the kernel command line?

Thanks,

tglx

2017-07-27 08:14:12

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 27/07/17 10:42, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Tomi Sarvela wrote:
>> On 26/07/17 17:26, Thomas Gleixner wrote:
>>> So reverting that commit does not help. Does it help on your machine?
>>
>> Yes. Reverting it does not cause the machine to lock up on resume.
>>
>> I haven't tested if the machine locks up later on, but at least it survives
>> couple of s/r cycles.
>
> Can you please try to add 'nohpet' to the kernel command line?

Option nohpet didn't change anything, still hangs on s/r.

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-27 19:42:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Thu, 27 Jul 2017, Tomi Sarvela wrote:

> On 27/07/17 10:42, Thomas Gleixner wrote:
> > On Thu, 27 Jul 2017, Tomi Sarvela wrote:
> > > On 26/07/17 17:26, Thomas Gleixner wrote:
> > > > So reverting that commit does not help. Does it help on your machine?
> > >
> > > Yes. Reverting it does not cause the machine to lock up on resume.
> > >
> > > I haven't tested if the machine locks up later on, but at least it
> > > survives
> > > couple of s/r cycles.
> >
> > Can you please try to add 'nohpet' to the kernel command line?
>
> Option nohpet didn't change anything, still hangs on s/r.

Ok. Was a shot in the dark. I tried on a similar machine, but that one
resumes fine (except that the AHCI controller plays silly buggers, but
nothing interrupt related). I might have access to another core2duo machine
tomorrow.

I'll send you a debug patch shortly, but can you please first check when
the wreckage happens by testing the states in

/sys/power/pm_test

freezer
devices
platform
processors
core

That might give some first hint to narrow down the issue.

Debug patch follows soon.

Thanks,

tglx

2017-07-27 20:13:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Thu, 27 Jul 2017, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Tomi Sarvela wrote:
>
> > On 27/07/17 10:42, Thomas Gleixner wrote:
> > > On Thu, 27 Jul 2017, Tomi Sarvela wrote:
> > > > On 26/07/17 17:26, Thomas Gleixner wrote:
> > > > > So reverting that commit does not help. Does it help on your machine?
> > > >
> > > > Yes. Reverting it does not cause the machine to lock up on resume.
> > > >
> > > > I haven't tested if the machine locks up later on, but at least it
> > > > survives
> > > > couple of s/r cycles.
> > >
> > > Can you please try to add 'nohpet' to the kernel command line?
> >
> > Option nohpet didn't change anything, still hangs on s/r.
>
> Ok. Was a shot in the dark. I tried on a similar machine, but that one
> resumes fine (except that the AHCI controller plays silly buggers, but
> nothing interrupt related). I might have access to another core2duo machine
> tomorrow.
>
> I'll send you a debug patch shortly, but can you please first check when
> the wreckage happens by testing the states in
>
> /sys/power/pm_test
>
> freezer
> devices
> platform
> processors
> core

Actually for suspend to ram we only have

freezer, devices, platform

I assume it's platform because that is where the actual interrupt
suspend/resume happens.

If that survives, then it's the low level architecture s/r code which
fiddles with the interrupt controllers and leaves them in a state which is
not known to the core code.

Thanks,

tglx

2017-07-27 21:08:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Thu, 27 Jul 2017, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Thomas Gleixner wrote:
> > On Thu, 27 Jul 2017, Tomi Sarvela wrote:
> >
> > > On 27/07/17 10:42, Thomas Gleixner wrote:
> > > > On Thu, 27 Jul 2017, Tomi Sarvela wrote:
> > > > > On 26/07/17 17:26, Thomas Gleixner wrote:
> > > > > > So reverting that commit does not help. Does it help on your machine?
> > > > >
> > > > > Yes. Reverting it does not cause the machine to lock up on resume.
> > > > >
> > > > > I haven't tested if the machine locks up later on, but at least it
> > > > > survives
> > > > > couple of s/r cycles.
> > > >
> > > > Can you please try to add 'nohpet' to the kernel command line?
> > >
> > > Option nohpet didn't change anything, still hangs on s/r.
> >
> > Ok. Was a shot in the dark. I tried on a similar machine, but that one
> > resumes fine (except that the AHCI controller plays silly buggers, but
> > nothing interrupt related). I might have access to another core2duo machine
> > tomorrow.
> >
> > I'll send you a debug patch shortly, but can you please first check when
> > the wreckage happens by testing the states in
> >
> > /sys/power/pm_test
> >
> > freezer
> > devices
> > platform
> > processors
> > core
>
> Actually for suspend to ram we only have
>
> freezer, devices, platform
>
> I assume it's platform because that is where the actual interrupt
> suspend/resume happens.
>
> If that survives, then it's the low level architecture s/r code which
> fiddles with the interrupt controllers and leaves them in a state which is
> not known to the core code.

Debug patch below. It should make the machine resume again. Emphasis on
"should". Please provide the output of /sys/kernel/debug/tracing/trace
after resume.

Thanks,

tglx

8<-----------

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -304,7 +304,10 @@ void irq_shutdown(struct irq_desc *desc)

void irq_enable(struct irq_desc *desc)
{
- if (!irqd_irq_disabled(&desc->irq_data)) {
+ if (irq_suspend_resume)
+ irq_trace_state("preenable", desc);
+
+ if (!irqd_irq_disabled(&desc->irq_data) && !irq_suspend_resume) {
unmask_irq(desc);
} else {
irq_state_clr_disabled(desc);
@@ -315,10 +318,16 @@ void irq_enable(struct irq_desc *desc)
unmask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postenable", desc);
}

static void __irq_disable(struct irq_desc *desc, bool mask)
{
+ if (irq_suspend_resume)
+ irq_trace_state("predisable", desc);
+
if (irqd_irq_disabled(&desc->irq_data)) {
if (mask)
mask_irq(desc);
@@ -331,6 +340,9 @@ static void __irq_disable(struct irq_des
mask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postdisable", desc);
}

/**
@@ -390,6 +402,9 @@ static inline void mask_ack_irq(struct i

void mask_irq(struct irq_desc *desc)
{
+ if (irq_suspend_resume)
+ irq_trace_state("premask", desc);
+
if (irqd_irq_masked(&desc->irq_data))
return;

@@ -397,17 +412,26 @@ void mask_irq(struct irq_desc *desc)
desc->irq_data.chip->irq_mask(&desc->irq_data);
irq_state_set_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postmask", desc);
}

void unmask_irq(struct irq_desc *desc)
{
- if (!irqd_irq_masked(&desc->irq_data))
+ if (irq_suspend_resume)
+ irq_trace_state("preunmask", desc);
+
+ if (!irqd_irq_masked(&desc->irq_data) && !irq_suspend_resume)
return;

if (desc->irq_data.chip->irq_unmask) {
desc->irq_data.chip->irq_unmask(&desc->irq_data);
irq_state_clr_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postunmask", desc);
}

void unmask_threaded_irq(struct irq_desc *desc)
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -459,3 +459,11 @@ static inline void irq_remove_debugfs_en
{
}
#endif /* CONFIG_GENERIC_IRQ_DEBUGFS */
+
+bool irq_suspend_resume;
+
+static inline void irq_trace_state(const char *what, struct irq_desc *desc)
+{
+ trace_printk("%s %d state %08x\n", what, irq_desc_get_irq(desc),
+ irqd_get(&desc->irq_data));
+}
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -14,6 +14,8 @@

#include "internals.h"

+bool irq_suspend_resume;
+
bool irq_pm_check_wakeup(struct irq_desc *desc)
{
if (irqd_is_wakeup_armed(&desc->irq_data)) {
@@ -120,6 +122,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_suspend_resume = true;
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -127,7 +130,9 @@ void suspend_device_irqs(void)
if (irq_settings_is_nested_thread(desc))
continue;
raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("presuspend", desc);
sync = suspend_device_irq(desc);
+ irq_trace_state("postsuspend", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);

if (sync)
@@ -172,9 +177,14 @@ static void resume_irqs(bool want_early)
continue;

raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("preresume", desc);
resume_irq(desc);
+ irq_trace_state("postresume", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);
}
+
+ if (!want_early)
+ irq_suspend_resume = false;
}

/**

2017-07-28 12:31:09

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 28/07/17 00:08, Thomas Gleixner wrote:
> On Thu, 27 Jul 2017, Thomas Gleixner wrote:
>> On Thu, 27 Jul 2017, Thomas Gleixner wrote:
>>> On Thu, 27 Jul 2017, Tomi Sarvela wrote:
>>>> On 27/07/17 10:42, Thomas Gleixner wrote:
>>>>> On Thu, 27 Jul 2017, Tomi Sarvela wrote:
>>>>>> On 26/07/17 17:26, Thomas Gleixner wrote:
>>>>>>> So reverting that commit does not help. Does it help on your machine?
>>>>>>
>>>>>> Yes. Reverting it does not cause the machine to lock up on resume.
>>>>>>
>>>>>> I haven't tested if the machine locks up later on, but at least it
>>>>>> survives
>>>>>> couple of s/r cycles.
>>>>>
>>>>> Can you please try to add 'nohpet' to the kernel command line?
>>>>
>>>> Option nohpet didn't change anything, still hangs on s/r.
>>>
>>> Ok. Was a shot in the dark. I tried on a similar machine, but that one
>>> resumes fine (except that the AHCI controller plays silly buggers, but
>>> nothing interrupt related). I might have access to another core2duo machine
>>> tomorrow.
>>>
>>> I'll send you a debug patch shortly, but can you please first check when
>>> the wreckage happens by testing the states in
>>>
>>> /sys/power/pm_test
>>>
>>> freezer
>>> devices
>>> platform
>>> processors
>>> core
>>
>> Actually for suspend to ram we only have
>>
>> freezer, devices, platform
>>
>> I assume it's platform because that is where the actual interrupt
>> suspend/resume happens.
>>
>> If that survives, then it's the low level architecture s/r code which
>> fiddles with the interrupt controllers and leaves them in a state which is
>> not known to the core code.
>
> Debug patch below. It should make the machine resume again. Emphasis on
> "should". Please provide the output of /sys/kernel/debug/tracing/trace
> after resume.

The patch didn't apply cleanly: can you tell exact commit or tag it has
been created against? I tried to hand-wrangle the changes in, but then I
got compilation errors:

CC ipc/compat.o
+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:(.bss+0x0): first defined here
kernel/irq/manage.o:(.bss+0x8): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/drm-tip/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/spurious.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'

Also, the usage of /sys/power/pm_test was not intuitive to me. Can you
explain which kind of combinations do you want to test?

Best regards,

Tomi


> Thanks,
>
> tglx
>
> 8<-----------
>
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -304,7 +304,10 @@ void irq_shutdown(struct irq_desc *desc)
>
> void irq_enable(struct irq_desc *desc)
> {
> - if (!irqd_irq_disabled(&desc->irq_data)) {
> + if (irq_suspend_resume)
> + irq_trace_state("preenable", desc);
> +
> + if (!irqd_irq_disabled(&desc->irq_data) && !irq_suspend_resume) {
> unmask_irq(desc);
> } else {
> irq_state_clr_disabled(desc);
> @@ -315,10 +318,16 @@ void irq_enable(struct irq_desc *desc)
> unmask_irq(desc);
> }
> }
> +
> + if (irq_suspend_resume)
> + irq_trace_state("postenable", desc);
> }
>
> static void __irq_disable(struct irq_desc *desc, bool mask)
> {
> + if (irq_suspend_resume)
> + irq_trace_state("predisable", desc);
> +
> if (irqd_irq_disabled(&desc->irq_data)) {
> if (mask)
> mask_irq(desc);
> @@ -331,6 +340,9 @@ static void __irq_disable(struct irq_des
> mask_irq(desc);
> }
> }
> +
> + if (irq_suspend_resume)
> + irq_trace_state("postdisable", desc);
> }
>
> /**
> @@ -390,6 +402,9 @@ static inline void mask_ack_irq(struct i
>
> void mask_irq(struct irq_desc *desc)
> {
> + if (irq_suspend_resume)
> + irq_trace_state("premask", desc);
> +
> if (irqd_irq_masked(&desc->irq_data))
> return;
>
> @@ -397,17 +412,26 @@ void mask_irq(struct irq_desc *desc)
> desc->irq_data.chip->irq_mask(&desc->irq_data);
> irq_state_set_masked(desc);
> }
> +
> + if (irq_suspend_resume)
> + irq_trace_state("postmask", desc);
> }
>
> void unmask_irq(struct irq_desc *desc)
> {
> - if (!irqd_irq_masked(&desc->irq_data))
> + if (irq_suspend_resume)
> + irq_trace_state("preunmask", desc);
> +
> + if (!irqd_irq_masked(&desc->irq_data) && !irq_suspend_resume)
> return;
>
> if (desc->irq_data.chip->irq_unmask) {
> desc->irq_data.chip->irq_unmask(&desc->irq_data);
> irq_state_clr_masked(desc);
> }
> +
> + if (irq_suspend_resume)
> + irq_trace_state("postunmask", desc);
> }
>
> void unmask_threaded_irq(struct irq_desc *desc)
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -459,3 +459,11 @@ static inline void irq_remove_debugfs_en
> {
> }
> #endif /* CONFIG_GENERIC_IRQ_DEBUGFS */
> +
> +bool irq_suspend_resume;
> +
> +static inline void irq_trace_state(const char *what, struct irq_desc *desc)
> +{
> + trace_printk("%s %d state %08x\n", what, irq_desc_get_irq(desc),
> + irqd_get(&desc->irq_data));
> +}
> --- a/kernel/irq/pm.c
> +++ b/kernel/irq/pm.c
> @@ -14,6 +14,8 @@
>
> #include "internals.h"
>
> +bool irq_suspend_resume;
> +
> bool irq_pm_check_wakeup(struct irq_desc *desc)
> {
> if (irqd_is_wakeup_armed(&desc->irq_data)) {
> @@ -120,6 +122,7 @@ void suspend_device_irqs(void)
> struct irq_desc *desc;
> int irq;
>
> + irq_suspend_resume = true;
> for_each_irq_desc(irq, desc) {
> unsigned long flags;
> bool sync;
> @@ -127,7 +130,9 @@ void suspend_device_irqs(void)
> if (irq_settings_is_nested_thread(desc))
> continue;
> raw_spin_lock_irqsave(&desc->lock, flags);
> + irq_trace_state("presuspend", desc);
> sync = suspend_device_irq(desc);
> + irq_trace_state("postsuspend", desc);
> raw_spin_unlock_irqrestore(&desc->lock, flags);
>
> if (sync)
> @@ -172,9 +177,14 @@ static void resume_irqs(bool want_early)
> continue;
>
> raw_spin_lock_irqsave(&desc->lock, flags);
> + irq_trace_state("preresume", desc);
> resume_irq(desc);
> + irq_trace_state("postresume", desc);
> raw_spin_unlock_irqrestore(&desc->lock, flags);
> }
> +
> + if (!want_early)
> + irq_suspend_resume = false;
> }
>
> /**
>
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-28 12:34:09

by Martin Peres

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 27/07/17 10:36, Tomi Sarvela wrote:
> On 26/07/17 17:26, Thomas Gleixner wrote:
>> On Wed, 26 Jul 2017, Martin Peres wrote:
>>> On 25/07/17 10:01, Tomi Sarvela wrote:
>>>> On 24/07/17 19:37, Martin Peres wrote:
>>>>> On 24/07/17 19:35, Thomas Gleixner wrote:
>>>>>> On Mon, 24 Jul 2017, Martin Peres wrote:
>>>>>>> On 24/07/17 18:28, Thomas Gleixner wrote:
>>>>>>>> Output of 'cat /proc/interrupts' and a description what kind of
>>>>>>>> 'old' Intel
>>>>>>>> platform that is.
>>>>>>>
>>>>>>> Sorry, I should have repeated the name outside of just the
>>>>>>> subject of
>>>>>>> the
>>>>>>> email. It is an Intel Eagle Lake Core2Duo.
>>>>>>
>>>>>> I hoped you would give me a little bit more details, but I Gurgled it
>>>>>> now. I try to find such a beast and look what's wrong there. If I
>>>>>> can't
>>>>>> find one, I'll come back with some debug patches.
>>>>>
>>>>> Ha, sorry!
>>>>>
>>>>> Here is all the info I have:
>>>>> - Hardware: HP Compaq 8000
>>>>> - CPU: Eagle Lake Core2Duo E7500
>>>>
>>>> As mentioned, it's HP Compaq 8000 Elite, desktop tower in most basic
>>>> configuration: one SATA SSD, 4GB memory and so on. Most recent BIOS
>>>> from
>>>> around 2012, no UEFI.
>>>>
>>>> If you have proposed fix for the issue, I'll be happy to test them.
>>>>
>>>> Best regards,
>>>
>>> Thanks for the information Tomi.
>>>
>>> Thomas, any update on this? Another user reported the problem:
>>
>> Not yet.
>>
>>> "I have the same thing on an old cheap laptop.
>>> model name : Intel(R) Core(TM)2 CPU
>>> cpu family : 6
>>> model : 15
>>>
>>> Previously, this never happened on the kernel series 4.{10,11,12}.
>>> After reverted commit bf22ff45bed664 on 4.13-rc2, unfortunately it
>>> still hangs
>>> when it wakes up.
>>
>> So reverting that commit does not help. Does it help on your machine?
>
> Yes. Reverting it does not cause the machine to lock up on resume.
>
> I haven't tested if the machine locks up later on, but at least it
> survives couple of s/r cycles

FYI, the user who reported reverting the patch was not sufficient
actually now says he was hit by a second bug which he found.

So, a revert is always possible.

Cheers,
Martin

2017-07-28 12:42:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> On 28/07/17 00:08, Thomas Gleixner wrote:
> The patch didn't apply cleanly: can you tell exact commit or tag it has been
> created against?

Linus head

> I tried to hand-wrangle the changes in, but then I got compilation
> errors:

Sorry, forgot to refresh the patch. Working version below.

> Also, the usage of /sys/power/pm_test was not intuitive to me. Can you explain
> which kind of combinations do you want to test?

# echo freezer >/sys/power/pm_test
# echo mem >/sys/power/state

# echo devices >/sys/power/pm_test
# echo mem >/sys/power/state

# echo platform >/sys/power/pm_test
# echo mem >/sys/power/state

To reset the test stuff do:
# echo none >/sys/power/pm_test

Please try that before applying the debug patch.

Thanks,

tglx

8<-----------

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -304,7 +304,10 @@ void irq_shutdown(struct irq_desc *desc)

void irq_enable(struct irq_desc *desc)
{
- if (!irqd_irq_disabled(&desc->irq_data)) {
+ if (irq_suspend_resume)
+ irq_trace_state("preenable", desc);
+
+ if (!irqd_irq_disabled(&desc->irq_data) && !irq_suspend_resume) {
unmask_irq(desc);
} else {
irq_state_clr_disabled(desc);
@@ -315,10 +318,16 @@ void irq_enable(struct irq_desc *desc)
unmask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postenable", desc);
}

static void __irq_disable(struct irq_desc *desc, bool mask)
{
+ if (irq_suspend_resume)
+ irq_trace_state("predisable", desc);
+
if (irqd_irq_disabled(&desc->irq_data)) {
if (mask)
mask_irq(desc);
@@ -331,6 +340,9 @@ static void __irq_disable(struct irq_des
mask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postdisable", desc);
}

/**
@@ -390,6 +402,9 @@ static inline void mask_ack_irq(struct i

void mask_irq(struct irq_desc *desc)
{
+ if (irq_suspend_resume)
+ irq_trace_state("premask", desc);
+
if (irqd_irq_masked(&desc->irq_data))
return;

@@ -397,17 +412,26 @@ void mask_irq(struct irq_desc *desc)
desc->irq_data.chip->irq_mask(&desc->irq_data);
irq_state_set_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postmask", desc);
}

void unmask_irq(struct irq_desc *desc)
{
- if (!irqd_irq_masked(&desc->irq_data))
+ if (irq_suspend_resume)
+ irq_trace_state("preunmask", desc);
+
+ if (!irqd_irq_masked(&desc->irq_data) && !irq_suspend_resume)
return;

if (desc->irq_data.chip->irq_unmask) {
desc->irq_data.chip->irq_unmask(&desc->irq_data);
irq_state_clr_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postunmask", desc);
}

void unmask_threaded_irq(struct irq_desc *desc)
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -459,3 +459,11 @@ static inline void irq_remove_debugfs_en
{
}
#endif /* CONFIG_GENERIC_IRQ_DEBUGFS */
+
+bool irq_suspend_resume;
+
+static inline void irq_trace_state(const char *what, struct irq_desc *desc)
+{
+ trace_printk("%s %d state %08x\n", what, irq_desc_get_irq(desc),
+ irqd_get(&desc->irq_data));
+}
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -14,6 +14,8 @@

#include "internals.h"

+extern bool irq_suspend_resume;
+
bool irq_pm_check_wakeup(struct irq_desc *desc)
{
if (irqd_is_wakeup_armed(&desc->irq_data)) {
@@ -120,6 +122,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_suspend_resume = true;
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -127,7 +130,9 @@ void suspend_device_irqs(void)
if (irq_settings_is_nested_thread(desc))
continue;
raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("presuspend", desc);
sync = suspend_device_irq(desc);
+ irq_trace_state("postsuspend", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);

if (sync)
@@ -172,9 +177,14 @@ static void resume_irqs(bool want_early)
continue;

raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("preresume", desc);
resume_irq(desc);
+ irq_trace_state("postresume", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);
}
+
+ if (!want_early)
+ irq_suspend_resume = false;
}

/**

2017-07-28 13:15:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Fri, 28 Jul 2017, Thomas Gleixner wrote:
> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> > On 28/07/17 00:08, Thomas Gleixner wrote:
> > The patch didn't apply cleanly: can you tell exact commit or tag it has been
> > created against?
>
> Linus head
>
> > I tried to hand-wrangle the changes in, but then I got compilation
> > errors:
>
> Sorry, forgot to refresh the patch. Working version below.
>
> > Also, the usage of /sys/power/pm_test was not intuitive to me. Can you explain
> > which kind of combinations do you want to test?
>
> # echo freezer >/sys/power/pm_test
> # echo mem >/sys/power/state
>
> # echo devices >/sys/power/pm_test
> # echo mem >/sys/power/state
>
> # echo platform >/sys/power/pm_test
> # echo mem >/sys/power/state
>
> To reset the test stuff do:
> # echo none >/sys/power/pm_test
>
> Please try that before applying the debug patch.

Another question. Is the machine completely dead or not?

Can you ping it or does the serial console show the start of resume or just
nothing at all?

Thanks,

tglx

2017-07-28 13:26:27

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 28/07/17 16:15, Thomas Gleixner wrote:
> On Fri, 28 Jul 2017, Thomas Gleixner wrote:
>> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>>> On 28/07/17 00:08, Thomas Gleixner wrote:
>>> The patch didn't apply cleanly: can you tell exact commit or tag it has been
>>> created against?
>>
>> Linus head
>>
>>> I tried to hand-wrangle the changes in, but then I got compilation
>>> errors:
>>
>> Sorry, forgot to refresh the patch. Working version below.
>>
>>> Also, the usage of /sys/power/pm_test was not intuitive to me. Can you explain
>>> which kind of combinations do you want to test?

This tested against just fetched 4.13.0-rc2 origin/master.

>> # echo freezer >/sys/power/pm_test
>> # echo mem >/sys/power/state

Works.

>> # echo devices >/sys/power/pm_test
>> # echo mem >/sys/power/state

Works.

>> # echo platform >/sys/power/pm_test
>> # echo mem >/sys/power/state

Works.

>> To reset the test stuff do:
>> # echo none >/sys/power/pm_test

>> Please try that before applying the debug patch.

Going with the suspend test, and it still leaves host dead.

root@elk:/opt/igt# ./tests/gem_exec_suspend --r basic-S3
IGT-Version: 1.19-gf1454751 (x86_64) (Linux: 4.13.0-rc2+ x86_64)
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Jul 28 13:21:27 2017

> Another question. Is the machine completely dead or not?

Completely dead. Powerled is on, so host isn't shut down. Serial or
network if don't give any signs of life.

Patch applies cleanly but still getting the same error:
LD vmlinux.o
kernel/irq/handle.o:(.bss+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:(.bss+0x0): first defined here
kernel/irq/manage.o:(.bss+0x8): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/spurious.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/resend.o:(.bss+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/chip.o:(.bss+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/dummychip.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/devres.o:(.bss+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/autoprobe.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/proc.o:(.bss+0x4): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/migration.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/cpuhotplug.o:(.bss+0x0): multiple definition of
`irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here
kernel/irq/pm.o:(.bss+0x0): multiple definition of `irq_suspend_resume'
kernel/irq/irqdesc.o:/home/testrunner/kernel/kernel/irq/irqdesc.c:270:
first defined here


Best regards,

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-28 14:13:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> On 28/07/17 16:15, Thomas Gleixner wrote:
> > Another question. Is the machine completely dead or not?
>
> Completely dead. Powerled is on, so host isn't shut down.

So that means it does not even power the machine down. That's what I
expected least.

> Serial or network if don't give any signs of life.

> Patch applies cleanly but still getting the same error:

Sorry for the noise. I'm an idiot trying to do 10 things at once. This time
it actually compiles and links.

If the machine does still not powerdown with this applied, then please redo
the 'platform' test and grab the trace for that one.

Thanks,

tglx

8<-----------
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -304,7 +304,10 @@ void irq_shutdown(struct irq_desc *desc)

void irq_enable(struct irq_desc *desc)
{
- if (!irqd_irq_disabled(&desc->irq_data)) {
+ if (irq_suspend_resume)
+ irq_trace_state("preenable", desc);
+
+ if (!irqd_irq_disabled(&desc->irq_data) && !irq_suspend_resume) {
unmask_irq(desc);
} else {
irq_state_clr_disabled(desc);
@@ -315,10 +318,16 @@ void irq_enable(struct irq_desc *desc)
unmask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postenable", desc);
}

static void __irq_disable(struct irq_desc *desc, bool mask)
{
+ if (irq_suspend_resume)
+ irq_trace_state("predisable", desc);
+
if (irqd_irq_disabled(&desc->irq_data)) {
if (mask)
mask_irq(desc);
@@ -331,6 +340,9 @@ static void __irq_disable(struct irq_des
mask_irq(desc);
}
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postdisable", desc);
}

/**
@@ -390,6 +402,9 @@ static inline void mask_ack_irq(struct i

void mask_irq(struct irq_desc *desc)
{
+ if (irq_suspend_resume)
+ irq_trace_state("premask", desc);
+
if (irqd_irq_masked(&desc->irq_data))
return;

@@ -397,17 +412,26 @@ void mask_irq(struct irq_desc *desc)
desc->irq_data.chip->irq_mask(&desc->irq_data);
irq_state_set_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postmask", desc);
}

void unmask_irq(struct irq_desc *desc)
{
- if (!irqd_irq_masked(&desc->irq_data))
+ if (irq_suspend_resume)
+ irq_trace_state("preunmask", desc);
+
+ if (!irqd_irq_masked(&desc->irq_data) && !irq_suspend_resume)
return;

if (desc->irq_data.chip->irq_unmask) {
desc->irq_data.chip->irq_unmask(&desc->irq_data);
irq_state_clr_masked(desc);
}
+
+ if (irq_suspend_resume)
+ irq_trace_state("postunmask", desc);
}

void unmask_threaded_irq(struct irq_desc *desc)
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -459,3 +459,11 @@ static inline void irq_remove_debugfs_en
{
}
#endif /* CONFIG_GENERIC_IRQ_DEBUGFS */
+
+extern bool irq_suspend_resume;
+
+static inline void irq_trace_state(const char *what, struct irq_desc *desc)
+{
+ trace_printk("%s %d state %08x\n", what, irq_desc_get_irq(desc),
+ irqd_get(&desc->irq_data));
+}
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -14,6 +14,8 @@

#include "internals.h"

+bool irq_suspend_resume;
+
bool irq_pm_check_wakeup(struct irq_desc *desc)
{
if (irqd_is_wakeup_armed(&desc->irq_data)) {
@@ -120,6 +122,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_suspend_resume = true;
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -127,7 +130,9 @@ void suspend_device_irqs(void)
if (irq_settings_is_nested_thread(desc))
continue;
raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("presuspend", desc);
sync = suspend_device_irq(desc);
+ irq_trace_state("postsuspend", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);

if (sync)
@@ -172,9 +177,14 @@ static void resume_irqs(bool want_early)
continue;

raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("preresume", desc);
resume_irq(desc);
+ irq_trace_state("postresume", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);
}
+
+ if (!want_early)
+ irq_suspend_resume = false;
}

/**

2017-07-28 14:47:58

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 28/07/17 17:13, Thomas Gleixner wrote:
> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>> On 28/07/17 16:15, Thomas Gleixner wrote:
>>> Another question. Is the machine completely dead or not?
>>
>> Completely dead. Powerled is on, so host isn't shut down.
>
> So that means it does not even power the machine down. That's what I
> expected least.
>
>> Serial or network if don't give any signs of life.
>
>> Patch applies cleanly but still getting the same error:
>
> Sorry for the noise. I'm an idiot trying to do 10 things at once. This time
> it actually compiles and links.
>
> If the machine does still not powerdown with this applied, then please redo
> the 'platform' test and grab the trace for that one.

This patch fixes the issue. Below is the dmesg from the testrun (sorry
for the spam, we're primarily testing i915 issues).

Tested-by: Tomi Sarvela <[email protected]>

Tomi


[ 1189.061801] [IGT] gem_exec_suspend: executing
[ 1189.071208] Setting dangerous option reset - tainting kernel
[ 1189.071438] [IGT] gem_exec_suspend: starting subtest basic-S3
[ 1189.239034] PM: Syncing filesystems ... done.
[ 1189.245968] PM: Preparing system for sleep (mem)
[ 1189.246209] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 1189.247530] OOM killer disabled.
[ 1189.247531] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 1189.248690] PM: Suspending system (mem)
[ 1189.248706] Suspending console(s) (use no_console_suspend to debug)
[ 1189.249177] serial 00:03: disabled
[ 1189.250266] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1189.255064] e1000e: EEE TX LPI TIMER: 00000000
[ 1189.256460] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.257541] sd 0:0:0:0: [sda] Stopping disk
[ 1189.560357] PM: suspend of devices complete after 311.444 msecs
[ 1189.560653] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.572123] PM: late suspend of devices complete after 11.761 msecs
[ 1189.584066] PM: noirq suspend of devices complete after 11.939 msecs
[ 1189.584580] ACPI: Preparing to enter system sleep state S3
[ 1189.586661] PM: Saving platform NVS memory
[ 1189.586686] Disabling non-boot CPUs ...
[ 1189.596350] IRQ 8: no longer affine to CPU1
[ 1189.596356] IRQ 16: no longer affine to CPU1
[ 1189.596367] IRQ 25: no longer affine to CPU1
[ 1189.596369] IRQ 28: no longer affine to CPU1
[ 1189.597397] smpboot: CPU 1 is now offline
[ 1189.597665] ACPI: Low-level resume complete
[ 1189.597665] PM: Restoring platform NVS memory
[ 1189.597665] Suspended for 14.825 seconds
[ 1189.597665] Delta way too big! 18446743991837909721
ts=18446744056274518328 write stamp = 64436608607
If you just came from a suspend/resume,
please switch to the trace global clock:
echo global > /sys/kernel/debug/tracing/trace_clock
[ 1189.597665] ------------[ cut here ]------------
[ 1189.597665] WARNING: CPU: 0 PID: 1332 at
kernel/trace/ring_buffer.c:2647 rb_handle_timestamp.isra.32+0x71/0x80
[ 1189.597665] Modules linked in: snd_hda_codec_realtek i915
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core
coretemp snd_pcm e1000e lpc_ich mei_me ptp mei pps_core
[ 1189.597665] CPU: 0 PID: 1332 Comm: rtcwake Tainted: G U
4.13.0-rc2+ #5
[ 1189.597665] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT
PC/3647h, BIOS 786G7 v01.13 07/20/2011
[ 1189.597665] task: ffff88010dce9880 task.stack: ffffc900000c8000
[ 1189.597665] RIP: 0010:rb_handle_timestamp.isra.32+0x71/0x80
[ 1189.597665] RSP: 0018:ffffc900000cbab0 EFLAGS: 00010082
[ 1189.597665] RAX: 00000000000000e0 RBX: ffffc900000cbad0 RCX:
0000000000000004
[ 1189.597665] RDX: 0000000080000004 RSI: 0000000000000082 RDI:
00000000ffffffff
[ 1189.597665] RBP: ffffc900000cbac0 R08: 0000000000000000 R09:
00000000000000e0
[ 1189.597665] R10: ffffc900000cbbe0 R11: 00000000000ebc08 R12:
ffff880117008890
[ 1189.597665] R13: ffff8801170088b0 R14: 00000000000003e8 R15:
0000000000000005
[ 1189.597665] FS: 00007f93d957e700(0000) GS:ffff88011bc00000(0000)
knlGS:0000000000000000
[ 1189.597665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1189.597665] CR2: 000000f00255b068 CR3: 00000000d9063000 CR4:
00000000000406f0
[ 1189.597665] Call Trace:
[ 1189.597665] ring_buffer_lock_reserve+0x1fa/0x350
[ 1189.597665] trace_vbprintk+0xdc/0x260
[ 1189.597665] ? __irq_disable+0x1b/0xc0
[ 1189.597665] __trace_bprintk+0x4a/0x60
[ 1189.597665] ? preempt_count_add+0x9e/0xb0
[ 1189.597665] __irq_disable+0x3f/0xc0
[ 1189.597665] irq_disable+0x17/0x20
[ 1189.597665] __disable_irq_nosync+0x59/0x70
[ 1189.597665] disable_hardirq+0x11/0x30
[ 1189.597665] hpet_msi_resume+0x85/0xd0
[ 1189.597665] clockevents_tick_resume+0x14/0x20
[ 1189.597665] tick_resume_local+0x32/0x60
[ 1189.597665] tick_resume+0x13/0x20
[ 1189.597665] timekeeping_resume+0x149/0x1a0
[ 1189.597665] syscore_resume+0x4b/0x190
[ 1189.597665] ? syscore_resume+0x4b/0x190
[ 1189.597665] suspend_devices_and_enter+0x6b9/0x810
[ 1189.597665] pm_suspend+0x367/0x540
[ 1189.597665] state_store+0x7e/0xf0
[ 1189.597665] kobj_attr_store+0xf/0x20
[ 1189.597665] sysfs_kf_write+0x37/0x40
[ 1189.597665] kernfs_fop_write+0x110/0x1a0
[ 1189.597665] __vfs_write+0x28/0x130
[ 1189.597665] ? __this_cpu_preempt_check+0x13/0x20
[ 1189.597665] ? __sb_start_write+0x55/0xe0
[ 1189.597665] vfs_write+0xb6/0x1a0
[ 1189.597665] SyS_write+0x46/0xb0
[ 1189.597665] entry_SYSCALL_64_fastpath+0x17/0x98
[ 1189.597665] RIP: 0033:0x7f93d90ac8f0
[ 1189.597665] RSP: 002b:00007ffd80a9ca88 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 1189.597665] RAX: ffffffffffffffda RBX: 000000f00255a050 RCX:
00007f93d90ac8f0
[ 1189.597665] RDX: 0000000000000004 RSI: 000000f00255a060 RDI:
0000000000000007
[ 1189.597665] RBP: 00007f93d9375b00 R08: 000000f002557d90 R09:
00007f93d957e700
[ 1189.597665] R10: 00007f93d9375b58 R11: 0000000000000246 R12:
00007f93d9375b58
[ 1189.597665] R13: 00007f93d9375b58 R14: 000000000000270f R15:
0000000000001010
[ 1189.597665] Code: f0 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 88 9a a5 81
49 c7 c0 1a 3b a8 81 4c 0f 44 c0 48 8b 0f 48 c7 c7 10 9b a5 81 e8 c0 c8
fa ff <0f> ff eb a7 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56
[ 1189.597665] ---[ end trace 7d99ac836f161565 ]---
[ 1189.597665] Enabling non-boot CPUs ...
[ 1189.597665] x86: Booting SMP configuration:
[ 1189.597665] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1189.598576] cache: parent cpu1 should not be sleeping
[ 1189.598651] hpet: hpet3 irq 25 for MSI
[ 1189.598710] CPU1 is up
[ 1189.600116] ACPI: Waking up from system sleep state S3
[ 1189.612296] PM: noirq resume of devices complete after 12.089 msecs
[ 1189.612703] hpet1: lost 615 rtc interrupts
[ 1189.612875] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.612971] PM: early resume of devices complete after 0.166 msecs
[ 1189.614754] [drm:i915_redisable_vga_power_on [i915]] Something
enabled VGA plane, disabling it
[ 1189.615088] [drm:intel_opregion_setup [i915]] graphic opregion
physical addr: 0xdd7a1640
[ 1189.615119] [drm:intel_opregion_setup [i915]] SWSCI supported
[ 1189.615143] [drm:swsci [i915]] SWSCI request already in progress
[ 1189.615167] [drm:swsci [i915]] SWSCI request already in progress
[ 1189.615190] [drm:intel_opregion_setup [i915]] SWSCI GBDA callbacks
00000001, SBCB callbacks 00000001
[ 1189.615215] [drm:intel_opregion_setup [i915]] Found valid VBT in ACPI
OpRegion (Mailbox #4)
[ 1189.615255] [drm:init_workarounds_ring [i915]] rcs0: Number of
context specific w/a: 0
[ 1189.615284] [drm:intel_hpll_vco [i915]] HPLL VCO 3200000 kHz
[ 1189.615309] [drm:intel_update_cdclk [i915]] Current CD clock rate:
400000 kHz, VCO: 3200000 kHz, ref: 0 kHz
[ 1189.615340] [drm:intel_set_plane_visible [i915]] pipe A active planes 0x0
[ 1189.615364] [drm:intel_modeset_setup_hw_state [i915]] [CRTC:32:pipe
A] hw state readout: disabled
[ 1189.615389] [drm:intel_set_plane_visible [i915]] pipe B active planes 0x0
[ 1189.615413] [drm:intel_modeset_setup_hw_state [i915]] [CRTC:39:pipe
B] hw state readout: disabled
[ 1189.615439] [drm:intel_modeset_setup_hw_state [i915]]
[ENCODER:41:CRT] hw state readout: disabled, pipe A
[ 1189.615463] [drm:intel_modeset_setup_hw_state [i915]]
[ENCODER:42:HDMI C] hw state readout: disabled, pipe A
[ 1189.615488] [drm:intel_modeset_setup_hw_state [i915]] [ENCODER:47:DP
C] hw state readout: disabled, pipe A
[ 1189.615513] [drm:intel_modeset_setup_hw_state [i915]]
[CONNECTOR:40:VGA-1] hw state readout: disabled
[ 1189.615538] [drm:intel_modeset_setup_hw_state [i915]]
[CONNECTOR:43:HDMI-A-1] hw state readout: disabled
[ 1189.615563] [drm:intel_modeset_setup_hw_state [i915]]
[CONNECTOR:48:DP-1] hw state readout: disabled
[ 1189.615588] [drm:intel_dump_pipe_config [i915]] [CRTC:32:pipe
A][setup_hw_state]
[ 1189.615613] [drm:intel_dump_pipe_config [i915]] cpu_transcoder: A,
pipe bpp: 0, dithering: 0
[ 1189.615637] [drm:intel_dump_pipe_config [i915]] audio: 0, infoframes: 0
[ 1189.615660] [drm:intel_dump_pipe_config [i915]] requested mode:
[ 1189.615664] [drm:drm_mode_debug_printmodeline] Modeline 0:"" 0 0 0 0
0 0 0 0 0 0 0x0 0x0
[ 1189.615688] [drm:intel_dump_pipe_config [i915]] adjusted mode:
[ 1189.615690] [drm:drm_mode_debug_printmodeline] Modeline 0:"" 0 0 0 0
0 0 0 0 0 0 0x0 0x0
[ 1189.615715] [drm:intel_dump_pipe_config [i915]] crtc timings: 0 0 0 0
0 0 0 0 0, type: 0x0 flags: 0x0
[ 1189.615739] [drm:intel_dump_pipe_config [i915]] port clock: 0, pipe
src size: 0x0, pixel rate 0
[ 1189.615763] [drm:intel_dump_pipe_config [i915]] gmch pfit: control:
0x00000000, ratios: 0x00000000, lvds border: 0x00000000
[ 1189.615787] [drm:intel_dump_pipe_config [i915]] ips: 0, double wide: 0
[ 1189.615811] [drm:intel_dpll_dump_hw_state [i915]] dpll_hw_state:
dpll: 0x0, dpll_md: 0x0, fp0: 0x0, fp1: 0x0
[ 1189.615834] [drm:intel_dump_pipe_config [i915]] planes on this crtc
[ 1189.615859] [drm:intel_dump_pipe_config [i915]] [PLANE:26:primary A]
disabled, scaler_id = 0
[ 1189.615883] [drm:intel_dump_pipe_config [i915]] [PLANE:28:sprite A]
disabled, scaler_id = 0
[ 1189.615907] [drm:intel_dump_pipe_config [i915]] [PLANE:30:cursor A]
disabled, scaler_id = 0
[ 1189.615932] [drm:intel_dump_pipe_config [i915]] [CRTC:39:pipe
B][setup_hw_state]
[ 1189.615956] [drm:intel_dump_pipe_config [i915]] cpu_transcoder: B,
pipe bpp: 0, dithering: 0
[ 1189.615979] [drm:intel_dump_pipe_config [i915]] audio: 0, infoframes: 0
[ 1189.616003] [drm:intel_dump_pipe_config [i915]] requested mode:
[ 1189.616012] [drm:drm_mode_debug_printmodeline] Modeline 0:"" 0 0 0 0
0 0 0 0 0 0 0x0 0x0
[ 1189.616036] [drm:intel_dump_pipe_config [i915]] adjusted mode:
[ 1189.616038] [drm:drm_mode_debug_printmodeline] Modeline 0:"" 0 0 0 0
0 0 0 0 0 0 0x0 0x0
[ 1189.616062] [drm:intel_dump_pipe_config [i915]] crtc timings: 0 0 0 0
0 0 0 0 0, type: 0x0 flags: 0x0
[ 1189.616086] [drm:intel_dump_pipe_config [i915]] port clock: 0, pipe
src size: 0x0, pixel rate 0
[ 1189.616110] [drm:intel_dump_pipe_config [i915]] gmch pfit: control:
0x00000000, ratios: 0x00000000, lvds border: 0x00000000
[ 1189.616134] [drm:intel_dump_pipe_config [i915]] ips: 0, double wide: 0
[ 1189.616158] [drm:intel_dpll_dump_hw_state [i915]] dpll_hw_state:
dpll: 0x0, dpll_md: 0x0, fp0: 0x0, fp1: 0x0
[ 1189.616182] [drm:intel_dump_pipe_config [i915]] planes on this crtc
[ 1189.616206] [drm:intel_dump_pipe_config [i915]] [PLANE:33:primary B]
disabled, scaler_id = 0
[ 1189.616230] [drm:intel_dump_pipe_config [i915]] [PLANE:35:sprite B]
disabled, scaler_id = 0
[ 1189.616254] [drm:intel_dump_pipe_config [i915]] [PLANE:37:cursor B]
disabled, scaler_id = 0
[ 1189.616276] [drm:g4x_wm_get_hw_state [i915]] Initial watermarks: pipe
A, plane=15, cursor=15, sprite=15
[ 1189.616296] [drm:g4x_wm_get_hw_state [i915]] Initial watermarks: pipe
B, plane=15, cursor=15, sprite=15
[ 1189.616316] [drm:g4x_wm_get_hw_state [i915]] Initial SR watermarks:
plane=127, cursor=0 fbc=1
[ 1189.616335] [drm:g4x_wm_get_hw_state [i915]] Initial HPLL watermarks:
plane=0, SR cursor=0 fbc=5
[ 1189.616355] [drm:g4x_wm_get_hw_state [i915]] Initial SR=no HPLL=no FBC=no
[ 1189.616377] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.616397] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.616417] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.616512] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.616538] [drm:intel_atomic_commit_tail [i915]] [ENCODER:41:CRT]
[ 1189.616562] [drm:intel_atomic_commit_tail [i915]] [ENCODER:42:HDMI C]
[ 1189.616586] [drm:intel_atomic_commit_tail [i915]] [ENCODER:47:DP C]
[ 1189.616610] [drm:verify_connector_state.isra.72 [i915]]
[CONNECTOR:40:VGA-1]
[ 1189.616635] [drm:verify_connector_state.isra.72 [i915]]
[CONNECTOR:43:HDMI-A-1]
[ 1189.616659] [drm:verify_connector_state.isra.72 [i915]]
[CONNECTOR:48:DP-1]
[ 1189.616685] [drm:intel_atomic_commit_tail [i915]] [CRTC:32:pipe A]
[ 1189.616710] [drm:intel_atomic_commit_tail [i915]] [CRTC:39:pipe B]
[ 1189.616731] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.616764] [drm:intel_crt_detect [i915]] [CONNECTOR:40:VGA-1] force=0
[ 1189.616784] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.617427] usb usb3: root hub lost power or was reset
[ 1189.617462] usb usb4: root hub lost power or was reset
[ 1189.617494] usb usb5: root hub lost power or was reset
[ 1189.617629] usb usb6: root hub lost power or was reset
[ 1189.617662] usb usb7: root hub lost power or was reset
[ 1189.617695] usb usb8: root hub lost power or was reset
[ 1189.618430] sd 0:0:0:0: [sda] Starting disk
[ 1189.618599] serial 00:03: activated
[ 1189.628872] [drm:intel_crt_detect [i915]] CRT not detected via hotplug
[ 1189.630945] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK for
addr: 0050 w(1)
[ 1189.630993] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK on
first message, retry
[ 1189.633063] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus vga] NAK for
addr: 0050 w(1)
[ 1189.633067] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent
adapter i915 gmbus vga
[ 1189.633110] [drm:intel_crt_get_edid [i915]] CRT GMBUS EDID read
failed, retry using GPIO bit-banging
[ 1189.633154] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging
on i915 gmbus vga. force bit now 1
[ 1189.633663] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent
adapter i915 gmbus vga
[ 1189.633687] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging
on i915 gmbus vga. force bit now 0
[ 1189.633711] [drm:intel_crt_detect_ddc [i915]] CRT not detected via
DDC:0x50 [no valid EDID found]
[ 1189.633731] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.633734] [drm:drm_helper_hpd_irq_event] [CONNECTOR:40:VGA-1]
status updated from disconnected to disconnected
[ 1189.633758] [drm:intel_hdmi_detect [i915]] [CONNECTOR:43:HDMI-A-1]
[ 1189.633778] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.635849] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0050 w(1)
[ 1189.635892] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK on
first message, retry
[ 1189.637962] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0050 w(1)
[ 1189.637965] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent
adapter i915 gmbus dpc
[ 1189.640036] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0040 w(1)
[ 1189.640092] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK on
first message, retry
[ 1189.642161] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0040 w(1)
[ 1189.642166] [drm:drm_dp_dual_mode_detect] DP dual mode HDMI ID: (err -6)
[ 1189.642203] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.642205] [drm:drm_helper_hpd_irq_event] [CONNECTOR:43:HDMI-A-1]
status updated from disconnected to disconnected
[ 1189.642229] [drm:intel_dp_detect [i915]] [CONNECTOR:48:DP-1]
[ 1189.642249] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.642270] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.642272] [drm:drm_helper_hpd_irq_event] [CONNECTOR:48:DP-1] status
updated from disconnected to disconnected
[ 1189.726634] [drm:intel_get_hpd_pins [i915]] hotplug event received,
stat 0x00100000, dig 0x00100000, pins 0x00000040
[ 1189.726678] [drm:intel_hpd_irq_handler [i915]] digital hpd port C - long
[ 1189.726722] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt
on PIN 6 - cnt: 0
[ 1189.726776] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port C - long
[ 1189.726810] [drm:i915_hotplug_work_func [i915]] running encoder
hotplug functions
[ 1189.726835] [drm:i915_hotplug_work_func [i915]] Connector HDMI-A-1
(pin 6) received hotplug event.
[ 1189.726859] [drm:intel_hdmi_detect [i915]] [CONNECTOR:43:HDMI-A-1]
[ 1189.726879] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.728946] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0050 w(1)
[ 1189.728989] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK on
first message, retry
[ 1189.730071] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0050 w(1)
[ 1189.730075] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent
adapter i915 gmbus dpc
[ 1189.732142] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0040 w(1)
[ 1189.732185] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK on
first message, retry
[ 1189.734069] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpc] NAK for
addr: 0040 w(1)
[ 1189.734073] [drm:drm_dp_dual_mode_detect] DP dual mode HDMI ID: (err -6)
[ 1189.734111] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.734155] [drm:i915_hotplug_work_func [i915]] Connector DP-1 (pin
6) received hotplug event.
[ 1189.734201] [drm:intel_dp_detect [i915]] [CONNECTOR:48:DP-1]
[ 1189.734221] [drm:intel_power_well_enable [i915]] enabling always-on
[ 1189.734241] [drm:intel_power_well_disable [i915]] disabling always-on
[ 1189.924416] ata5: SATA link down (SStatus 0 SControl 300)
[ 1189.924432] ata3: SATA link down (SStatus 0 SControl 300)
[ 1189.924447] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1189.924461] ata4: SATA link down (SStatus 0 SControl 300)
[ 1189.924477] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1189.924857] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.924860] ata2.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.924862] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.933311] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.933313] ata2.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.933316] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.933320] ata2.00: configured for UDMA/100
[ 1189.935151] ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.935153] ata1.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.936958] ata1.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE
MODE) succeeded
[ 1189.946040] ata1.00: ACPI cmd e3/00:00:00:00:00:a0 (IDLE) succeeded
[ 1189.946043] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[ 1189.946045] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.986668] ata1.00: Security Log not supported
[ 1189.996656] ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.996659] ata1.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES)
filtered out
[ 1189.998508] ata1.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE
MODE) succeeded
[ 1190.007519] ata1.00: ACPI cmd e3/00:00:00:00:00:a0 (IDLE) succeeded
[ 1190.007522] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[ 1190.007524] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[ 1190.048147] ata1.00: Security Log not supported
[ 1190.048151] ata1.00: configured for UDMA/133
[ 1190.182093] PM: resume of devices complete after 569.117 msecs
[ 1190.182296] PM: Finishing wakeup.
[ 1190.182297] OOM killer enabled.
[ 1190.182297] Restarting tasks ... done.
[ 1190.300710] [IGT] gem_exec_suspend: exiting, ret=0

--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-28 14:51:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> On 28/07/17 17:13, Thomas Gleixner wrote:
> > On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> > > On 28/07/17 16:15, Thomas Gleixner wrote:
> > > > Another question. Is the machine completely dead or not?
> > >
> > > Completely dead. Powerled is on, so host isn't shut down.
> >
> > So that means it does not even power the machine down. That's what I
> > expected least.
> >
> > > Serial or network if don't give any signs of life.
> >
> > > Patch applies cleanly but still getting the same error:
> >
> > Sorry for the noise. I'm an idiot trying to do 10 things at once. This time
> > it actually compiles and links.
> >
> > If the machine does still not powerdown with this applied, then please redo
> > the 'platform' test and grab the trace for that one.
>
> This patch fixes the issue. Below is the dmesg from the testrun (sorry for the
> spam, we're primarily testing i915 issues).

Can you please retrieve the trace data from:

/sys/kernel/debug/tracing/trace

and provide that. The dmesg does not help much.

Thanks,

tglx

2017-07-28 14:56:21

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 28/07/17 17:50, Thomas Gleixner wrote:
> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>> On 28/07/17 17:13, Thomas Gleixner wrote:
>>> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>>>> On 28/07/17 16:15, Thomas Gleixner wrote:
>>>>> Another question. Is the machine completely dead or not?
>>>>
>>>> Completely dead. Powerled is on, so host isn't shut down.
>>>
>>> So that means it does not even power the machine down. That's what I
>>> expected least.
>>>
>>>> Serial or network if don't give any signs of life.
>>>
>>>> Patch applies cleanly but still getting the same error:
>>>
>>> Sorry for the noise. I'm an idiot trying to do 10 things at once. This time
>>> it actually compiles and links.
>>>
>>> If the machine does still not powerdown with this applied, then please redo
>>> the 'platform' test and grab the trace for that one.
>>
>> This patch fixes the issue. Below is the dmesg from the testrun (sorry for the
>> spam, we're primarily testing i915 issues).
>
> Can you please retrieve the trace data from:
>
> /sys/kernel/debug/tracing/trace
>
> and provide that. The dmesg does not help much.

Right, here you go.

$ sudo cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
rtcwake-1332 [000] d..1 64.411098: suspend_device_irqs:
presuspend 0 state 00400600
rtcwake-1332 [000] d..1 64.411101: suspend_device_irqs:
postsuspend 0 state 00400600
rtcwake-1332 [000] d..1 64.411102: suspend_device_irqs:
presuspend 1 state 00030000
rtcwake-1332 [000] d..1 64.411103: suspend_device_irqs:
postsuspend 1 state 00030000
rtcwake-1332 [000] d..1 64.411104: suspend_device_irqs:
presuspend 2 state 00030000
rtcwake-1332 [000] d..1 64.411104: suspend_device_irqs:
postsuspend 2 state 00030000
rtcwake-1332 [000] d..1 64.411105: suspend_device_irqs:
presuspend 3 state 00030000
rtcwake-1332 [000] d..1 64.411106: suspend_device_irqs:
postsuspend 3 state 00030000
rtcwake-1332 [000] d..1 64.411107: suspend_device_irqs:
presuspend 4 state 00031000
rtcwake-1332 [000] d..1 64.411107: suspend_device_irqs:
postsuspend 4 state 00031000
rtcwake-1332 [000] d..1 64.411108: suspend_device_irqs:
presuspend 5 state 00030000
rtcwake-1332 [000] d..1 64.411108: suspend_device_irqs:
postsuspend 5 state 00030000
rtcwake-1332 [000] d..1 64.411109: suspend_device_irqs:
presuspend 6 state 00030000
rtcwake-1332 [000] d..1 64.411110: suspend_device_irqs:
postsuspend 6 state 00030000
rtcwake-1332 [000] d..1 64.411110: suspend_device_irqs:
presuspend 7 state 00030000
rtcwake-1332 [000] d..1 64.411111: suspend_device_irqs:
postsuspend 7 state 00030000
rtcwake-1332 [000] d..1 64.411112: suspend_device_irqs:
presuspend 8 state 00401200
rtcwake-1332 [000] d..1 64.411112: __irq_disable:
predisable 8 state 00401200
rtcwake-1332 [000] d..1 64.411113: __irq_disable:
postdisable 8 state 00411200
rtcwake-1332 [000] d..1 64.411114: suspend_device_irqs:
postsuspend 8 state 00411200
rtcwake-1332 [000] d..1 64.411115: suspend_device_irqs:
presuspend 9 state 00403300
rtcwake-1332 [000] d..1 64.411115: __irq_disable:
predisable 9 state 00403300
rtcwake-1332 [000] d..1 64.411116: __irq_disable:
postdisable 9 state 00413300
rtcwake-1332 [000] d..1 64.411116: suspend_device_irqs:
postsuspend 9 state 00413300
rtcwake-1332 [000] d..1 64.411117: suspend_device_irqs:
presuspend 10 state 00030000
rtcwake-1332 [000] d..1 64.411118: suspend_device_irqs:
postsuspend 10 state 00030000
rtcwake-1332 [000] d..1 64.411119: suspend_device_irqs:
presuspend 11 state 00030000
rtcwake-1332 [000] d..1 64.411119: suspend_device_irqs:
postsuspend 11 state 00030000
rtcwake-1332 [000] d..1 64.411120: suspend_device_irqs:
presuspend 12 state 00030000
rtcwake-1332 [000] d..1 64.411120: suspend_device_irqs:
postsuspend 12 state 00030000
rtcwake-1332 [000] d..1 64.411121: suspend_device_irqs:
presuspend 13 state 00030000
rtcwake-1332 [000] d..1 64.411122: suspend_device_irqs:
postsuspend 13 state 00030000
rtcwake-1332 [000] d..1 64.411122: suspend_device_irqs:
presuspend 14 state 00030000
rtcwake-1332 [000] d..1 64.411123: suspend_device_irqs:
postsuspend 14 state 00030000
rtcwake-1332 [000] d..1 64.411124: suspend_device_irqs:
presuspend 15 state 00030000
rtcwake-1332 [000] d..1 64.411124: suspend_device_irqs:
postsuspend 15 state 00030000
rtcwake-1332 [000] d..1 64.411125: suspend_device_irqs:
presuspend 16 state 00403200
rtcwake-1332 [000] d..1 64.411126: __irq_disable:
predisable 16 state 00403200
rtcwake-1332 [000] d..1 64.411126: __irq_disable:
postdisable 16 state 00413200
rtcwake-1332 [000] d..1 64.411127: suspend_device_irqs:
postsuspend 16 state 00413200
rtcwake-1332 [000] d..1 64.411128: suspend_device_irqs:
presuspend 17 state 00033000
rtcwake-1332 [000] d..1 64.411128: suspend_device_irqs:
postsuspend 17 state 00033000
rtcwake-1332 [000] d..1 64.411129: suspend_device_irqs:
presuspend 18 state 00032000
rtcwake-1332 [000] d..1 64.411130: suspend_device_irqs:
postsuspend 18 state 00032000
rtcwake-1332 [000] d..1 64.411130: suspend_device_irqs:
presuspend 19 state 00032000
rtcwake-1332 [000] d..1 64.411131: suspend_device_irqs:
postsuspend 19 state 00032000
rtcwake-1332 [000] d..1 64.411132: suspend_device_irqs:
presuspend 20 state 00403300
rtcwake-1332 [000] d..1 64.411132: __irq_disable:
predisable 20 state 00403300
rtcwake-1332 [000] d..1 64.411133: __irq_disable:
postdisable 20 state 00413300
rtcwake-1332 [000] d..1 64.411133: suspend_device_irqs:
postsuspend 20 state 00413300
rtcwake-1332 [000] d..1 64.411134: suspend_device_irqs:
presuspend 21 state 00403300
rtcwake-1332 [000] d..1 64.411134: __irq_disable:
predisable 21 state 00403300
rtcwake-1332 [000] d..1 64.411135: __irq_disable:
postdisable 21 state 00413300
rtcwake-1332 [000] d..1 64.411136: suspend_device_irqs:
postsuspend 21 state 00413300
rtcwake-1332 [000] d..1 64.411136: suspend_device_irqs:
presuspend 22 state 00403300
rtcwake-1332 [000] d..1 64.411137: __irq_disable:
predisable 22 state 00403300
rtcwake-1332 [000] d..1 64.411137: __irq_disable:
postdisable 22 state 00413300
rtcwake-1332 [000] d..1 64.411138: suspend_device_irqs:
postsuspend 22 state 00413300
rtcwake-1332 [000] d..1 64.411139: suspend_device_irqs:
presuspend 24 state 00409600
rtcwake-1332 [000] d..1 64.411139: suspend_device_irqs:
postsuspend 24 state 00409600
rtcwake-1332 [000] d..1 64.411140: suspend_device_irqs:
presuspend 25 state 00409600
rtcwake-1332 [000] d..1 64.411141: suspend_device_irqs:
postsuspend 25 state 00409600
rtcwake-1332 [000] d..1 64.411142: suspend_device_irqs:
presuspend 26 state 00038000
rtcwake-1332 [000] d..1 64.411142: suspend_device_irqs:
postsuspend 26 state 00038000
rtcwake-1332 [000] d..1 64.411143: suspend_device_irqs:
presuspend 27 state 00038000
rtcwake-1332 [000] d..1 64.411143: suspend_device_irqs:
postsuspend 27 state 00038000
rtcwake-1332 [000] d..1 64.411144: suspend_device_irqs:
presuspend 28 state 00401200
rtcwake-1332 [000] d..1 64.411145: __irq_disable:
predisable 28 state 00401200
rtcwake-1332 [000] d..1 64.411145: __irq_disable:
postdisable 28 state 00411200
rtcwake-1332 [000] d..1 64.411146: suspend_device_irqs:
postsuspend 28 state 00411200
rtcwake-1332 [001] d.H1 64.425561: mask_irq: premask 8
state 00411200
rtcwake-1332 [001] d.H1 64.425565: mask_irq: postmask 8
state 00431200
rtcwake-1332 [000] dN.1 64.436605: __irq_disable:
predisable 25 state 00409600
rtcwake-1332 [000] dN.1 64.436607: mask_irq: premask 25
state 00419600
rtcwake-1332 [000] dN.1 64.436608: mask_irq: postmask 25
state 00439600
rtcwake-1332 [000] dN.1 64.436609: __irq_disable:
postdisable 25 state 00439600
rtcwake-1332 [000] d..1 576460734.868390: __irq_disable:
predisable 24 state 00409600
rtcwake-1332 [000] d..1 576460734.868508: __irq_disable:
postdisable 24 state 00419600
rtcwake-1332 [000] d..1 576460734.868511: irq_enable:
preenable 24 state 00419600
rtcwake-1332 [000] d..1 576460734.868511: unmask_irq:
preunmask 24 state 00409600
rtcwake-1332 [000] d..1 576460734.868512: unmask_irq:
postunmask 24 state 00409600
rtcwake-1332 [000] d..1 576460734.868512: irq_enable:
postenable 24 state 00409600
rtcwake-1332 [000] dNh1 576460734.868533: mask_irq: premask 9
state 00413200
rtcwake-1332 [000] dNh1 576460734.868535: mask_irq: postmask
9 state 00433200
kworker/1:1-1039 [001] d..1 576460734.869322: irq_enable:
preenable 25 state 00039600
kworker/1:1-1039 [001] d..1 576460734.869324: unmask_irq:
preunmask 25 state 00029600
kworker/1:1-1039 [001] d..1 576460734.869325: unmask_irq:
postunmask 25 state 00009600
kworker/1:1-1039 [001] d..1 576460734.869326: irq_enable:
postenable 25 state 00009600
kworker/1:1-1039 [001] d..1 576460734.869329: __irq_disable:
predisable 25 state 00409600
kworker/1:1-1039 [001] d..1 576460734.869329: __irq_disable:
postdisable 25 state 00419600
kworker/1:1-1039 [001] d..1 576460734.869332: irq_enable:
preenable 25 state 00419600
kworker/1:1-1039 [001] d..1 576460734.869332: unmask_irq:
preunmask 25 state 00409600
kworker/1:1-1039 [001] d..1 576460734.869333: unmask_irq:
postunmask 25 state 00409600
kworker/1:1-1039 [001] d..1 576460734.869333: irq_enable:
postenable 25 state 00409600
rtcwake-1332 [000] d..1 576460734.882983: resume_irqs:
preresume 0 state 00400600
rtcwake-1332 [000] d..1 18446744056.289114: resume_irqs:
postresume 0 state 00400600
rtcwake-1332 [000] d..1 18446744056.289116: resume_irqs:
preresume 1 state 00030000
rtcwake-1332 [000] d..1 18446744056.289116: resume_irqs:
postresume 1 state 00030000
rtcwake-1332 [000] d..1 18446744056.289117: resume_irqs:
preresume 2 state 00030000
rtcwake-1332 [000] d..1 18446744056.289118: resume_irqs:
postresume 2 state 00030000
rtcwake-1332 [000] d..1 18446744056.289118: resume_irqs:
preresume 3 state 00030000
rtcwake-1332 [000] d..1 18446744056.289119: resume_irqs:
postresume 3 state 00030000
rtcwake-1332 [000] d..1 18446744056.289120: resume_irqs:
preresume 4 state 00031000
rtcwake-1332 [000] d..1 18446744056.289120: resume_irqs:
postresume 4 state 00031000
rtcwake-1332 [000] d..1 18446744056.289121: resume_irqs:
preresume 5 state 00030000
rtcwake-1332 [000] d..1 18446744056.289122: resume_irqs:
postresume 5 state 00030000
rtcwake-1332 [000] d..1 18446744056.289122: resume_irqs:
preresume 6 state 00030000
rtcwake-1332 [000] d..1 18446744056.289123: resume_irqs:
postresume 6 state 00030000
rtcwake-1332 [000] d..1 18446744056.289124: resume_irqs:
preresume 7 state 00030000
rtcwake-1332 [000] d..1 18446744056.289124: resume_irqs:
postresume 7 state 00030000
rtcwake-1332 [000] d..1 18446744056.289125: resume_irqs:
preresume 8 state 00431200
rtcwake-1332 [000] d..1 18446744056.289126: irq_enable:
preenable 8 state 00431200
rtcwake-1332 [000] d..1 18446744056.289126: unmask_irq:
preunmask 8 state 00421200
rtcwake-1332 [000] d..1 18446744056.289128: unmask_irq:
postunmask 8 state 00401200
rtcwake-1332 [000] d..1 18446744056.289128: irq_enable:
postenable 8 state 00401200
rtcwake-1332 [000] d..1 18446744056.289129: resume_irqs:
postresume 8 state 00401200
rtcwake-1332 [000] d..1 18446744056.289570: resume_irqs:
preresume 9 state 00433200
rtcwake-1332 [000] d..1 18446744056.289571: irq_enable:
preenable 9 state 00433200
rtcwake-1332 [000] d..1 18446744056.289571: unmask_irq:
preunmask 9 state 00423200
rtcwake-1332 [000] d..1 18446744056.289572: unmask_irq:
postunmask 9 state 00403200
rtcwake-1332 [000] d..1 18446744056.289572: irq_enable:
postenable 9 state 00403200
rtcwake-1332 [000] d..1 18446744056.289572: resume_irqs:
postresume 9 state 00403200
rtcwake-1332 [000] d..1 18446744056.289573: resume_irqs:
preresume 10 state 00030000
rtcwake-1332 [000] d..1 18446744056.289573: resume_irqs:
postresume 10 state 00030000
rtcwake-1332 [000] d..1 18446744056.289588: resume_irqs:
preresume 11 state 00030000
rtcwake-1332 [000] d..1 18446744056.289588: resume_irqs:
postresume 11 state 00030000
rtcwake-1332 [000] d..1 18446744056.289589: resume_irqs:
preresume 12 state 00030000
rtcwake-1332 [000] d..1 18446744056.289589: resume_irqs:
postresume 12 state 00030000
rtcwake-1332 [000] d..1 18446744056.289589: resume_irqs:
preresume 13 state 00030000
rtcwake-1332 [000] d..1 18446744056.289590: resume_irqs:
postresume 13 state 00030000
rtcwake-1332 [000] d..1 18446744056.289590: resume_irqs:
preresume 14 state 00030000
rtcwake-1332 [000] d..1 18446744056.289591: resume_irqs:
postresume 14 state 00030000
rtcwake-1332 [000] d..1 18446744056.289591: resume_irqs:
preresume 15 state 00030000
rtcwake-1332 [000] d..1 18446744056.289591: resume_irqs:
postresume 15 state 00030000
rtcwake-1332 [000] d..1 18446744056.289592: resume_irqs:
preresume 16 state 00413200
rtcwake-1332 [000] d..1 18446744056.289592: irq_enable:
preenable 16 state 00413200
rtcwake-1332 [000] d..1 18446744056.289593: unmask_irq:
preunmask 16 state 00403200
rtcwake-1332 [000] d..1 18446744056.289593: unmask_irq:
postunmask 16 state 00403200
rtcwake-1332 [000] d..1 18446744056.289594: irq_enable:
postenable 16 state 00403200
rtcwake-1332 [000] d..1 18446744056.289594: resume_irqs:
postresume 16 state 00403200
rtcwake-1332 [000] d..1 18446744056.289594: resume_irqs:
preresume 17 state 00033000
rtcwake-1332 [000] d..1 18446744056.289595: resume_irqs:
postresume 17 state 00033000
rtcwake-1332 [000] d..1 18446744056.289595: resume_irqs:
preresume 18 state 00032000
rtcwake-1332 [000] d..1 18446744056.289596: resume_irqs:
postresume 18 state 00032000
rtcwake-1332 [000] d..1 18446744056.289596: resume_irqs:
preresume 19 state 00032000
rtcwake-1332 [000] d..1 18446744056.289596: resume_irqs:
postresume 19 state 00032000
rtcwake-1332 [000] d..1 18446744056.289597: resume_irqs:
preresume 20 state 00413200
rtcwake-1332 [000] d..1 18446744056.289597: irq_enable:
preenable 20 state 00413200
rtcwake-1332 [000] d..1 18446744056.289598: unmask_irq:
preunmask 20 state 00403200
rtcwake-1332 [000] d..1 18446744056.289598: unmask_irq:
postunmask 20 state 00403200
rtcwake-1332 [000] d..1 18446744056.289598: irq_enable:
postenable 20 state 00403200
rtcwake-1332 [000] d..1 18446744056.289599: resume_irqs:
postresume 20 state 00403200
rtcwake-1332 [000] d..1 18446744056.289599: resume_irqs:
preresume 21 state 00413200
rtcwake-1332 [000] d..1 18446744056.289600: irq_enable:
preenable 21 state 00413200
rtcwake-1332 [000] d..1 18446744056.289600: unmask_irq:
preunmask 21 state 00403200
rtcwake-1332 [000] d..1 18446744056.289600: unmask_irq:
postunmask 21 state 00403200
rtcwake-1332 [000] d..1 18446744056.289601: irq_enable:
postenable 21 state 00403200
rtcwake-1332 [000] d..1 18446744056.289601: resume_irqs:
postresume 21 state 00403200
rtcwake-1332 [000] d..1 18446744056.289602: resume_irqs:
preresume 22 state 00413200
rtcwake-1332 [000] d..1 18446744056.289602: irq_enable:
preenable 22 state 00413200
rtcwake-1332 [000] d..1 18446744056.289602: unmask_irq:
preunmask 22 state 00403200
rtcwake-1332 [000] d..1 18446744056.289603: unmask_irq:
postunmask 22 state 00403200
rtcwake-1332 [000] d..1 18446744056.289603: irq_enable:
postenable 22 state 00403200
rtcwake-1332 [000] d..1 18446744056.289603: resume_irqs:
postresume 22 state 00403200
rtcwake-1332 [000] d..1 18446744056.289604: resume_irqs:
preresume 24 state 00409600
rtcwake-1332 [000] d..1 18446744056.289604: resume_irqs:
postresume 24 state 00409600
rtcwake-1332 [000] d..1 18446744056.289604: resume_irqs:
preresume 25 state 00409600
rtcwake-1332 [000] d..1 18446744056.289605: resume_irqs:
postresume 25 state 00409600
rtcwake-1332 [000] d..1 18446744056.289605: resume_irqs:
preresume 26 state 00038000
rtcwake-1332 [000] d..1 18446744056.289606: resume_irqs:
postresume 26 state 00038000
rtcwake-1332 [000] d..1 18446744056.289606: resume_irqs:
preresume 27 state 00038000
rtcwake-1332 [000] d..1 18446744056.289606: resume_irqs:
postresume 27 state 00038000
rtcwake-1332 [000] d..1 18446744056.289607: resume_irqs:
preresume 28 state 00411200
rtcwake-1332 [000] d..1 18446744056.289607: irq_enable:
preenable 28 state 00411200
rtcwake-1332 [000] d..1 18446744056.289608: unmask_irq:
preunmask 28 state 00401200
rtcwake-1332 [000] d..1 18446744056.289608: unmask_irq:
postunmask 28 state 00401200
rtcwake-1332 [000] d..1 18446744056.289608: irq_enable:
postenable 28 state 00401200
rtcwake-1332 [000] d..1 18446744056.289609: resume_irqs:
postresume 28 state 00401200


Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-28 16:26:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> On 28/07/17 17:50, Thomas Gleixner wrote:
> > On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> > > On 28/07/17 17:13, Thomas Gleixner wrote:
> > > > On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> > > > > On 28/07/17 16:15, Thomas Gleixner wrote:
> > > > > > Another question. Is the machine completely dead or not?
> > > > >
> > > > > Completely dead. Powerled is on, so host isn't shut down.
> > > >
> > > > So that means it does not even power the machine down. That's what I
> > > > expected least.
> > > >
> > > > > Serial or network if don't give any signs of life.
> > > >
> > > > > Patch applies cleanly but still getting the same error:
> > > >
> > > > Sorry for the noise. I'm an idiot trying to do 10 things at once. This
> > > > time
> > > > it actually compiles and links.
> > > >
> > > > If the machine does still not powerdown with this applied, then please
> > > > redo
> > > > the 'platform' test and grab the trace for that one.
> > >
> > > This patch fixes the issue. Below is the dmesg from the testrun (sorry for
> > > the spam, we're primarily testing i915 issues).
> >
> > Can you please retrieve the trace data from:
> >
> > /sys/kernel/debug/tracing/trace
> >
> > and provide that. The dmesg does not help much.
>
> Right, here you go.

Thanks for providing the data. Just to be sure, that data was from a real
suspend, not the 'platform' test, right?

If so, that does not make any sense at all. The patch merily changes the
enable/resume path and adds the debug trace printks which have no influence
on the disable logic. But you said that the machine does not power off in
the bad case. That does not make any sense at all as the enable logic is
not involved at all in the suspend path.

Did you change anything else compared to the tests before ?

Thanks,

tglx


2017-07-31 07:22:30

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 28/07/17 19:26, Thomas Gleixner wrote:
> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>> On 28/07/17 17:50, Thomas Gleixner wrote:
>>> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>>>> On 28/07/17 17:13, Thomas Gleixner wrote:
>>>>> On Fri, 28 Jul 2017, Tomi Sarvela wrote:
>>>>>> On 28/07/17 16:15, Thomas Gleixner wrote:
>>>>>>> Another question. Is the machine completely dead or not?
>>>>>>
>>>>>> Completely dead. Powerled is on, so host isn't shut down.
>>>>>
>>>>> So that means it does not even power the machine down. That's what I
>>>>> expected least.
>>>>>
>>>>>> Serial or network if don't give any signs of life.
>>>>>
>>>>>> Patch applies cleanly but still getting the same error:
>>>>>
>>>>> Sorry for the noise. I'm an idiot trying to do 10 things at once. This
>>>>> time
>>>>> it actually compiles and links.
>>>>>
>>>>> If the machine does still not powerdown with this applied, then please
>>>>> redo
>>>>> the 'platform' test and grab the trace for that one.
>>>>
>>>> This patch fixes the issue. Below is the dmesg from the testrun (sorry for
>>>> the spam, we're primarily testing i915 issues).
>>>
>>> Can you please retrieve the trace data from:
>>>
>>> /sys/kernel/debug/tracing/trace
>>>
>>> and provide that. The dmesg does not help much.
>>
>> Right, here you go.
>
> Thanks for providing the data. Just to be sure, that data was from a real
> suspend, not the 'platform' test, right?
>
> If so, that does not make any sense at all. The patch merily changes the
> enable/resume path and adds the debug trace printks which have no influence
> on the disable logic. But you said that the machine does not power off in
> the bad case. That does not make any sense at all as the enable logic is
> not involved at all in the suspend path.
>
> Did you change anything else compared to the tests before ?

I did check that the problem persisted in linus-HEAD before testing your
patch. The testing was done in order (reading from console logs I happen
to still have in one window):

- reboot to patched kernel

[ 0.000000] Linux version 4.13.0-rc2+ (testrunner@elk) (gcc version
6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2)) #5 SMP PREEMPT Fri Jul 28
17:15:47 EEST 2017
[ 0.000000] Command line: BOOT_IMAGE=/boot/tsa.efi root=/dev/sda1
console=ttyS0,115200n8 console=tty0 intel_iommu=igfx_off drm.debug=0xe
nmi_watchdog=panic,auto panic=1 softdog.soft_panic=1
scsi_mod.use_blk_mq=0 rootwait ro 3

- "real" 15sec suspend test through IGT/piglit and rtcwake

$ ./scripts/run-tests.sh -vt igt@gem_exec_suspend@basic-s3 -x devices

- dmesg to go with suspend

[ 1189.597665] Suspended for 14.825 seconds
[ 1189.597665] Delta way too big! 18446743991837909721
ts=18446744056274518328 write stamp = 64436608607
If you just came from a suspend/resume,
please switch to the trace global clock:
echo global > /sys/kernel/debug/tracing/trace_clock
[ 1189.597665] ------------[ cut here ]------------
[ 1189.597665] WARNING: CPU: 0 PID: 1332 at
kernel/trace/ring_buffer.c:2647 rb_handle_timestamp.isra.32+0x71/0x80
[ 1189.597665] Modules linked in: snd_hda_codec_realtek i915
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core
coretemp snd_pcm e1000e lpc_ich mei_me ptp mei pps_core
[ 1189.597665] CPU: 0 PID: 1332 Comm: rtcwake Tainted: G U
4.13.0-rc2+ #5
[ 1189.597665] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT
PC/3647h, BIOS 786G7 v01.13 07/20/2011
[ 1189.597665] task: ffff88010dce9880 task.stack: ffffc900000c8000
[ 1189.597665] RIP: 0010:rb_handle_timestamp.isra.32+0x71/0x80
[ 1189.597665] RSP: 0018:ffffc900000cbab0 EFLAGS: 00010082
[ 1189.597665] RAX: 00000000000000e0 RBX: ffffc900000cbad0 RCX:
0000000000000004
[ 1189.597665] RDX: 0000000080000004 RSI: 0000000000000082 RDI:
00000000ffffffff
[ 1189.597665] RBP: ffffc900000cbac0 R08: 0000000000000000 R09:
00000000000000e0
[ 1189.597665] R10: ffffc900000cbbe0 R11: 00000000000ebc08 R12:
ffff880117008890
[ 1189.597665] R13: ffff8801170088b0 R14: 00000000000003e8 R15:
0000000000000005
[ 1189.597665] FS: 00007f93d957e700(0000) GS:ffff88011bc00000(0000)
knlGS:0000000000000000
[ 1189.597665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1189.597665] CR2: 000000f00255b068 CR3: 00000000d9063000 CR4:
00000000000406f0
[ 1189.597665] Call Trace:
[ 1189.597665] ring_buffer_lock_reserve+0x1fa/0x350
[ 1189.597665] trace_vbprintk+0xdc/0x260
[ 1189.597665] ? __irq_disable+0x1b/0xc0
[ 1189.597665] __trace_bprintk+0x4a/0x60
[ 1189.597665] ? preempt_count_add+0x9e/0xb0
[ 1189.597665] __irq_disable+0x3f/0xc0
[ 1189.597665] irq_disable+0x17/0x20
[ 1189.597665] __disable_irq_nosync+0x59/0x70
[ 1189.597665] disable_hardirq+0x11/0x30
[ 1189.597665] hpet_msi_resume+0x85/0xd0
[ 1189.597665] clockevents_tick_resume+0x14/0x20
[ 1189.597665] tick_resume_local+0x32/0x60
[ 1189.597665] tick_resume+0x13/0x20
[ 1189.597665] timekeeping_resume+0x149/0x1a0
[ 1189.597665] syscore_resume+0x4b/0x190
[ 1189.597665] ? syscore_resume+0x4b/0x190
[ 1189.597665] suspend_devices_and_enter+0x6b9/0x810
[ 1189.597665] pm_suspend+0x367/0x540
[ 1189.597665] state_store+0x7e/0xf0
[ 1189.597665] kobj_attr_store+0xf/0x20
[ 1189.597665] sysfs_kf_write+0x37/0x40
[ 1189.597665] kernfs_fop_write+0x110/0x1a0
[ 1189.597665] __vfs_write+0x28/0x130
[ 1189.597665] ? __this_cpu_preempt_check+0x13/0x20
[ 1189.597665] ? __sb_start_write+0x55/0xe0
[ 1189.597665] vfs_write+0xb6/0x1a0
[ 1189.597665] SyS_write+0x46/0xb0
[ 1189.597665] entry_SYSCALL_64_fastpath+0x17/0x98
[ 1189.597665] RIP: 0033:0x7f93d90ac8f0
[ 1189.597665] RSP: 002b:00007ffd80a9ca88 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 1189.597665] RAX: ffffffffffffffda RBX: 000000f00255a050 RCX:
00007f93d90ac8f0
[ 1189.597665] RDX: 0000000000000004 RSI: 000000f00255a060 RDI:
0000000000000007
[ 1189.597665] RBP: 00007f93d9375b00 R08: 000000f002557d90 R09:
00007f93d957e700
[ 1189.597665] R10: 00007f93d9375b58 R11: 0000000000000246 R12:
00007f93d9375b58
[ 1189.597665] R13: 00007f93d9375b58 R14: 000000000000270f R15:
0000000000001010
[ 1189.597665] Code: f0 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 88 9a a5 81
49 c7 c0 1a 3b a8 81 4c 0f 44 c0 48 8b 0f 48 c7 c7 10 9b a5 81 e8 c0 c8
fa ff <0f> ff eb a7 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56
[ 1189.597665] ---[ end trace 7d99ac836f161565 ]---

- printing out the trace from /sys/kernel/debug/tracing/trace


Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-31 07:46:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> On 28/07/17 19:26, Thomas Gleixner wrote:
> > Did you change anything else compared to the tests before ?
>
> I did check that the problem persisted in linus-HEAD before testing your
> patch. The testing was done in order (reading from console logs I happen to
> still have in one window):

What I still do not understand is why this would affect the suspend path in
any way.

Can you remove the previous patch and apply the one below. If it resumes,
please provide the data from the trace buffer again.

Thanks,

tglx

8<--------------
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -459,3 +459,11 @@ static inline void irq_remove_debugfs_en
{
}
#endif /* CONFIG_GENERIC_IRQ_DEBUGFS */
+
+extern bool irq_suspend_resume;
+
+static inline void irq_trace_state(const char *what, struct irq_desc *desc)
+{
+ trace_printk("%s %d state %08x\n", what, irq_desc_get_irq(desc),
+ irqd_get(&desc->irq_data));
+}
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -14,6 +14,8 @@

#include "internals.h"

+bool irq_suspend_resume;
+
bool irq_pm_check_wakeup(struct irq_desc *desc)
{
if (irqd_is_wakeup_armed(&desc->irq_data)) {
@@ -120,6 +122,7 @@ void suspend_device_irqs(void)
struct irq_desc *desc;
int irq;

+ irq_suspend_resume = true;
for_each_irq_desc(irq, desc) {
unsigned long flags;
bool sync;
@@ -127,7 +130,9 @@ void suspend_device_irqs(void)
if (irq_settings_is_nested_thread(desc))
continue;
raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("presuspend", desc);
sync = suspend_device_irq(desc);
+ irq_trace_state("postsuspend", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);

if (sync)
@@ -150,8 +155,9 @@ static void resume_irq(struct irq_desc *
/* Pretend that it got disabled ! */
desc->depth++;
irq_state_set_disabled(desc);
- irq_state_set_masked(desc);
+
resume:
+ irq_state_set_masked(desc);
desc->istate &= ~IRQS_SUSPENDED;
__enable_irq(desc);
}
@@ -172,9 +178,14 @@ static void resume_irqs(bool want_early)
continue;

raw_spin_lock_irqsave(&desc->lock, flags);
+ irq_trace_state("preresume", desc);
resume_irq(desc);
+ irq_trace_state("postresume", desc);
raw_spin_unlock_irqrestore(&desc->lock, flags);
}
+
+ if (!want_early)
+ irq_suspend_resume = false;
}

/**


2017-07-31 07:57:13

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 31/07/17 10:45, Thomas Gleixner wrote:
> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>> On 28/07/17 19:26, Thomas Gleixner wrote:
>>> Did you change anything else compared to the tests before ?
>>
>> I did check that the problem persisted in linus-HEAD before testing your
>> patch. The testing was done in order (reading from console logs I happen to
>> still have in one window):
>
> What I still do not understand is why this would affect the suspend path in
> any way.
>
> Can you remove the previous patch and apply the one below. If it resumes,
> please provide the data from the trace buffer again.

No such luck. ELK hangs in the suspend-test with earlier patch removed,
this added. Checked again that the power-led is on, no serial output.

Tree not pulled: still testing against the previous head -rc2, not
current 4.13.0-rc3

Best regards,

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-31 08:29:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> On 31/07/17 10:45, Thomas Gleixner wrote:
> > On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> > > On 28/07/17 19:26, Thomas Gleixner wrote:
> > > > Did you change anything else compared to the tests before ?
> > >
> > > I did check that the problem persisted in linus-HEAD before testing your
> > > patch. The testing was done in order (reading from console logs I happen
> > > to
> > > still have in one window):
> >
> > What I still do not understand is why this would affect the suspend path in
> > any way.
> >
> > Can you remove the previous patch and apply the one below. If it resumes,
> > please provide the data from the trace buffer again.
>
> No such luck. ELK hangs in the suspend-test with earlier patch removed, this
> added. Checked again that the power-led is on, no serial output.
>
> Tree not pulled: still testing against the previous head -rc2, not current
> 4.13.0-rc3

Shouldn't make a difference. Can you please try the following:

Offline CPU1 before invoking suspend.

# echo 0 >/sys/devices/system/cpus/cpu1/offline

Thanks,

tglx

2017-07-31 08:35:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Rafael,

On Fri, 28 Jul 2017, Tomi Sarvela wrote:
> rtcwake-1332 [000] d..1 64.411112: suspend_device_irqs:
> presuspend 8 state 00401200
> rtcwake-1332 [000] d..1 64.411112: __irq_disable: predisable 8
> state 00401200
> rtcwake-1332 [000] d..1 64.411113: __irq_disable: postdisable 8
> state 00411200
> rtcwake-1332 [000] d..1 64.411114: suspend_device_irqs:
> postsuspend 8 state 00411200
> rtcwake-1332 [000] d..1 64.411115: suspend_device_irqs:
> presuspend 9 state 00403300
> rtcwake-1332 [000] d..1 64.411115: __irq_disable: predisable 9
> state 00403300
> rtcwake-1332 [000] d..1 64.411116: __irq_disable: postdisable 9
> state 00413300
> rtcwake-1332 [000] d..1 64.411116: suspend_device_irqs:
> postsuspend 9 state 00413300
> rtcwake-1332 [000] d..1 64.411117: suspend_device_irqs:

I'm trying to analyze that suspend/resume problem. While staring at the
trace Tomi provided I noticed, that neither the RTC irq nor the ACPI irq
have the wakeup armed bit set, but the system resumes from an RTC
interrupt. That's inconsistent at least, or am I missing something
essential?

Thanks,

tglx

2017-07-31 08:43:10

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 31/07/17 11:29, Thomas Gleixner wrote:
> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>> On 31/07/17 10:45, Thomas Gleixner wrote:
>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>> On 28/07/17 19:26, Thomas Gleixner wrote:
>>>>> Did you change anything else compared to the tests before ?
>>>>
>>>> I did check that the problem persisted in linus-HEAD before testing your
>>>> patch. The testing was done in order (reading from console logs I happen
>>>> to
>>>> still have in one window):
>>>
>>> What I still do not understand is why this would affect the suspend path in
>>> any way.
>>>
>>> Can you remove the previous patch and apply the one below. If it resumes,
>>> please provide the data from the trace buffer again.
>>
>> No such luck. ELK hangs in the suspend-test with earlier patch removed, this
>> added. Checked again that the power-led is on, no serial output.
>>
>> Tree not pulled: still testing against the previous head -rc2, not current
>> 4.13.0-rc3
>
> Shouldn't make a difference. Can you please try the following:
>
> Offline CPU1 before invoking suspend.
>
> # echo 0 >/sys/devices/system/cpus/cpu1/offline

Tested with your latest patch (irq_trace_state):

echo 0 >/sys/devices/system/cpu/cpu1/online

./scripts/run-tests.sh -vt igt@gem_exec_suspend@basic-s3 -x devices

No change, no wakey at all.

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-31 14:05:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> On 31/07/17 11:29, Thomas Gleixner wrote:
> > On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> > > On 31/07/17 10:45, Thomas Gleixner wrote:
> > > > On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> > > > > On 28/07/17 19:26, Thomas Gleixner wrote:
> > > > > > Did you change anything else compared to the tests before ?
> > > > >
> > > > > I did check that the problem persisted in linus-HEAD before testing
> > > > > your
> > > > > patch. The testing was done in order (reading from console logs I
> > > > > happen
> > > > > to
> > > > > still have in one window):
> > > >
> > > > What I still do not understand is why this would affect the suspend path
> > > > in
> > > > any way.
> > > >
> > > > Can you remove the previous patch and apply the one below. If it
> > > > resumes,
> > > > please provide the data from the trace buffer again.
> > >
> > > No such luck. ELK hangs in the suspend-test with earlier patch removed,
> > > this
> > > added. Checked again that the power-led is on, no serial output.
> > >
> > > Tree not pulled: still testing against the previous head -rc2, not current
> > > 4.13.0-rc3
> >
> > Shouldn't make a difference. Can you please try the following:
> >
> > Offline CPU1 before invoking suspend.
> >
> > # echo 0 >/sys/devices/system/cpus/cpu1/offline
>
> Tested with your latest patch (irq_trace_state):
>
> echo 0 >/sys/devices/system/cpu/cpu1/online
>
> ./scripts/run-tests.sh -vt igt@gem_exec_suspend@basic-s3 -x devices

So this "igt@gem_exec_suspend@basic-s3" thingy is that executing anything
extra aside of 'echo mem > /sys/power/state'?

Thanks,

tglx

2017-07-31 14:14:57

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 31/07/17 17:04, Thomas Gleixner wrote:
> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>> On 31/07/17 11:29, Thomas Gleixner wrote:
>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>> On 31/07/17 10:45, Thomas Gleixner wrote:
>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>> On 28/07/17 19:26, Thomas Gleixner wrote:
>>>>>>> Did you change anything else compared to the tests before ?
>>>>>>
>>>>>> I did check that the problem persisted in linus-HEAD before testing
>>>>>> your
>>>>>> patch. The testing was done in order (reading from console logs I
>>>>>> happen
>>>>>> to
>>>>>> still have in one window):
>>>>>
>>>>> What I still do not understand is why this would affect the suspend path
>>>>> in
>>>>> any way.
>>>>>
>>>>> Can you remove the previous patch and apply the one below. If it
>>>>> resumes,
>>>>> please provide the data from the trace buffer again.
>>>>
>>>> No such luck. ELK hangs in the suspend-test with earlier patch removed,
>>>> this
>>>> added. Checked again that the power-led is on, no serial output.
>>>>
>>>> Tree not pulled: still testing against the previous head -rc2, not current
>>>> 4.13.0-rc3
>>>
>>> Shouldn't make a difference. Can you please try the following:
>>>
>>> Offline CPU1 before invoking suspend.
>>>
>>> # echo 0 >/sys/devices/system/cpus/cpu1/offline
>>
>> Tested with your latest patch (irq_trace_state):
>>
>> echo 0 >/sys/devices/system/cpu/cpu1/online
>>
>> ./scripts/run-tests.sh -vt igt@gem_exec_suspend@basic-s3 -x devices
>
> So this "igt@gem_exec_suspend@basic-s3" thingy is that executing anything
> extra aside of 'echo mem > /sys/power/state'?

It's setting wakeup with rtcwake to +15 seconds, then suspending.

Complete information glanceable from sources:

https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/tree/tests/gem_exec_suspend.c
->

https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/tree/lib/igt_aux.c:void
igt_system_suspend_autoresume

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-31 15:07:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> On 31/07/17 10:45, Thomas Gleixner wrote:
> > On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> > > On 28/07/17 19:26, Thomas Gleixner wrote:
> > > > Did you change anything else compared to the tests before ?
> > >
> > > I did check that the problem persisted in linus-HEAD before testing your
> > > patch. The testing was done in order (reading from console logs I happen
> > > to
> > > still have in one window):
> >
> > What I still do not understand is why this would affect the suspend path in
> > any way.
> >
> > Can you remove the previous patch and apply the one below. If it resumes,
> > please provide the data from the trace buffer again.
>
> No such luck. ELK hangs in the suspend-test with earlier patch removed, this
> added. Checked again that the power-led is on, no serial output.

Can you please remove the patch. And try the following:

# echo N > /sys/module/printk/parameters/console_suspend

# echo mem > /sys/power/state

and log the output of the serial console. That way we might get a clue
where it gets stuck.

Thanks,

tglx


2017-07-31 15:45:30

by Sarvela, Tomi P

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 31/07/17 18:06, Thomas Gleixner wrote:
> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>> On 31/07/17 10:45, Thomas Gleixner wrote:
>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>> On 28/07/17 19:26, Thomas Gleixner wrote:
>>>>> Did you change anything else compared to the tests before ?
>>>>
>>>> I did check that the problem persisted in linus-HEAD before testing your
>>>> patch. The testing was done in order (reading from console logs I happen
>>>> to
>>>> still have in one window):
>>>
>>> What I still do not understand is why this would affect the suspend path in
>>> any way.
>>>
>>> Can you remove the previous patch and apply the one below. If it resumes,
>>> please provide the data from the trace buffer again.
>>
>> No such luck. ELK hangs in the suspend-test with earlier patch removed, this
>> added. Checked again that the power-led is on, no serial output.
>
> Can you please remove the patch. And try the following:
>
> # echo N > /sys/module/printk/parameters/console_suspend
>
> # echo mem > /sys/power/state
>
> and log the output of the serial console. That way we might get a clue
> where it gets stuck.

I'm afraid it hangs right away. No response from SSH, no output to serial.

Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo

2017-07-31 15:55:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Mon, 31 Jul 2017, Tomi Sarvela wrote:
> On 31/07/17 18:06, Thomas Gleixner wrote:
> > Can you please remove the patch. And try the following:
> >
> > # echo N > /sys/module/printk/parameters/console_suspend
> >
> > # echo mem > /sys/power/state
> >
> > and log the output of the serial console. That way we might get a clue
> > where it gets stuck.
>
> I'm afraid it hangs right away. No response from SSH, no output to serial.

What means hangs right away? Is there no output at all on the serial
console? Or does it just stop at some point?

Thanks,

tglx

2017-08-03 07:32:16

by Masahiro Yamada

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi.

2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>> On 31/07/17 18:06, Thomas Gleixner wrote:
>> > Can you please remove the patch. And try the following:
>> >
>> > # echo N > /sys/module/printk/parameters/console_suspend
>> >
>> > # echo mem > /sys/power/state
>> >
>> > and log the output of the serial console. That way we might get a clue
>> > where it gets stuck.
>>
>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>
> What means hangs right away? Is there no output at all on the serial
> console? Or does it just stop at some point?
>
> Thanks,
>
> tglx
>

Sorry for jumping in.
Finally, I found this thread.


My environment is completely different (ARM64 board),
I am also suffering from a hibernation problem
since this commit.


I get no response on the serial console
after "Restarting tasks ... done." log message.


By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
irq function calls", I can get hibernation working again.


SW info:
defconfig: arch/arm64/configs/defconfig
DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
PSCI : ARM Trusted Firmware


SoC info:
CPU : Cortex-A72 * 2 + Cortex-A53 * 2
irqchip : GICv3 (drivers/irq/irq-gic-v3.c)





--
Best Regards
Masahiro Yamada

2017-08-03 08:23:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On Thu, Aug 03, 2017 at 04:32:09PM +0900, Masahiro Yamada wrote:

> My environment is completely different (ARM64 board),
> I am also suffering from a hibernation problem
> since this commit.
>
>
> I get no response on the serial console
> after "Restarting tasks ... done." log message.
>
>
> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
> irq function calls", I can get hibernation working again.

https://lkml.kernel.org/r/alpine.DEB.2.20.1707312158590.2287@nanos

Is the patch that cured the x86 issue, but maybe that gives a clue as
what to look for in your platform.

2017-08-03 08:41:57

by Marc Zyngier

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi Masahiro,

On 03/08/17 08:32, Masahiro Yamada wrote:
> Hi.
>
> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>> Can you please remove the patch. And try the following:
>>>>
>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>
>>>> # echo mem > /sys/power/state
>>>>
>>>> and log the output of the serial console. That way we might get a clue
>>>> where it gets stuck.
>>>
>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>
>> What means hangs right away? Is there no output at all on the serial
>> console? Or does it just stop at some point?
>>
>> Thanks,
>>
>> tglx
>>
>
> Sorry for jumping in.
> Finally, I found this thread.
>
>
> My environment is completely different (ARM64 board),
> I am also suffering from a hibernation problem
> since this commit.
>
>
> I get no response on the serial console
> after "Restarting tasks ... done." log message.
>
>
> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
> irq function calls", I can get hibernation working again.
>
>
> SW info:
> defconfig: arch/arm64/configs/defconfig
> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
> PSCI : ARM Trusted Firmware
>
>
> SoC info:
> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)

Let me take an educated guess: It feels like your firmware doesn't
save/restore the GIC context across suspend/resume. Is that something
you could check, assuming you have access to the firmware source code?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2017-08-03 12:52:11

by Masahiro Yamada

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi Marc,

2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
> Hi Masahiro,
>
> On 03/08/17 08:32, Masahiro Yamada wrote:
>> Hi.
>>
>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>> Can you please remove the patch. And try the following:
>>>>>
>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>
>>>>> # echo mem > /sys/power/state
>>>>>
>>>>> and log the output of the serial console. That way we might get a clue
>>>>> where it gets stuck.
>>>>
>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>
>>> What means hangs right away? Is there no output at all on the serial
>>> console? Or does it just stop at some point?
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>>
>> Sorry for jumping in.
>> Finally, I found this thread.
>>
>>
>> My environment is completely different (ARM64 board),
>> I am also suffering from a hibernation problem
>> since this commit.
>>
>>
>> I get no response on the serial console
>> after "Restarting tasks ... done." log message.
>>
>>
>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>> irq function calls", I can get hibernation working again.
>>
>>
>> SW info:
>> defconfig: arch/arm64/configs/defconfig
>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>> PSCI : ARM Trusted Firmware
>>
>>
>> SoC info:
>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>
> Let me take an educated guess: It feels like your firmware doesn't
> save/restore the GIC context across suspend/resume. Is that something
> you could check, assuming you have access to the firmware source code?

Thanks for your comments.


I do not know much about the manner of preserving GICv3 context.

I can see this patch (rejected?) :
https://patchwork.kernel.org/patch/9343061/


Is it something that should be completely cared by firmware
instead of kernel?


ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
is open source software, and I pushed my platform code to the upstream.

So, yes, I (and everybody) can have access to the firmware source code.


I am not sure how ATF saves the context during hibernation, though.



--
Best Regards
Masahiro Yamada

2017-08-03 13:30:21

by Marc Zyngier

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 03/08/17 13:52, Masahiro Yamada wrote:
> Hi Marc,
>
> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>> Hi Masahiro,
>>
>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>> Hi.
>>>
>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>> Can you please remove the patch. And try the following:
>>>>>>
>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>
>>>>>> # echo mem > /sys/power/state
>>>>>>
>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>> where it gets stuck.
>>>>>
>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>
>>>> What means hangs right away? Is there no output at all on the serial
>>>> console? Or does it just stop at some point?
>>>>
>>>> Thanks,
>>>>
>>>> tglx
>>>>
>>>
>>> Sorry for jumping in.
>>> Finally, I found this thread.
>>>
>>>
>>> My environment is completely different (ARM64 board),
>>> I am also suffering from a hibernation problem
>>> since this commit.
>>>
>>>
>>> I get no response on the serial console
>>> after "Restarting tasks ... done." log message.
>>>
>>>
>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>> irq function calls", I can get hibernation working again.
>>>
>>>
>>> SW info:
>>> defconfig: arch/arm64/configs/defconfig
>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>> PSCI : ARM Trusted Firmware
>>>
>>>
>>> SoC info:
>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>
>> Let me take an educated guess: It feels like your firmware doesn't
>> save/restore the GIC context across suspend/resume. Is that something
>> you could check, assuming you have access to the firmware source code?
>
> Thanks for your comments.
>
>
> I do not know much about the manner of preserving GICv3 context.
>
> I can see this patch (rejected?) :
> https://patchwork.kernel.org/patch/9343061/
>
>
> Is it something that should be completely cared by firmware
> instead of kernel?

That was definitely the intention, but it looks like something that ATF
has only started supporting very recently:

https://github.com/ARM-software/arm-trusted-firmware/pull/1047

> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
> is open source software, and I pushed my platform code to the upstream.
>
> So, yes, I (and everybody) can have access to the firmware source code.
>
>
> I am not sure how ATF saves the context during hibernation, though.

See the above link. Is there any chance of you trying this into your
firmware?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2017-08-07 04:45:53

by Masahiro Yamada

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi Marc,


2017-08-03 22:30 GMT+09:00 Marc Zyngier <[email protected]>:
> On 03/08/17 13:52, Masahiro Yamada wrote:
>> Hi Marc,
>>
>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>>> Hi Masahiro,
>>>
>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>> Hi.
>>>>
>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>
>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>
>>>>>>> # echo mem > /sys/power/state
>>>>>>>
>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>> where it gets stuck.
>>>>>>
>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>
>>>>> What means hangs right away? Is there no output at all on the serial
>>>>> console? Or does it just stop at some point?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> tglx
>>>>>
>>>>
>>>> Sorry for jumping in.
>>>> Finally, I found this thread.
>>>>
>>>>
>>>> My environment is completely different (ARM64 board),
>>>> I am also suffering from a hibernation problem
>>>> since this commit.
>>>>
>>>>
>>>> I get no response on the serial console
>>>> after "Restarting tasks ... done." log message.
>>>>
>>>>
>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>> irq function calls", I can get hibernation working again.
>>>>
>>>>
>>>> SW info:
>>>> defconfig: arch/arm64/configs/defconfig
>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>> PSCI : ARM Trusted Firmware
>>>>
>>>>
>>>> SoC info:
>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>
>>> Let me take an educated guess: It feels like your firmware doesn't
>>> save/restore the GIC context across suspend/resume. Is that something
>>> you could check, assuming you have access to the firmware source code?
>>
>> Thanks for your comments.
>>
>>
>> I do not know much about the manner of preserving GICv3 context.
>>
>> I can see this patch (rejected?) :
>> https://patchwork.kernel.org/patch/9343061/
>>
>>
>> Is it something that should be completely cared by firmware
>> instead of kernel?
>
> That was definitely the intention, but it looks like something that ATF
> has only started supporting very recently:
>
> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>
>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>> is open source software, and I pushed my platform code to the upstream.
>>
>> So, yes, I (and everybody) can have access to the firmware source code.
>>
>>
>> I am not sure how ATF saves the context during hibernation, though.
>
> See the above link. Is there any chance of you trying this into your
> firmware?
>
> Thanks,

Thanks for the pointer.


Yes. I will try that once GIC-v3 context save/restore is supported in ATF.

I think that will basically work for suspend-to-ram
because all contexts including both non-secure and secure worlds will
be retained in the main memory.

However, I still do not understand how the context is preserved during
the hibernation (suspend-to-disk).


If my understanding is correct, hibernation on Linux works like follows:

[1] Freeze all tasks
[2] CPU_OFF for non-boot CPUs
[3] Create a hibernation image
[4] CPU_ON for non-boot CPUs
[5] Write the hibernation image to the disk (=swap area)
[6] SYSTEM_OFF


IIUC, [5] only writes the context Linux takes care of (only non-secure).

If so, where and how does the firmware write the GIC-v3 context
to the disk?


--
Best Regards
Masahiro Yamada

2017-08-07 08:17:41

by Marc Zyngier

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 07/08/17 05:45, Masahiro Yamada wrote:
> Hi Marc,
>
>
> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <[email protected]>:
>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>> Hi Marc,
>>>
>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>>>> Hi Masahiro,
>>>>
>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>> Hi.
>>>>>
>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>
>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>
>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>
>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>> where it gets stuck.
>>>>>>>
>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>
>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>> console? Or does it just stop at some point?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> tglx
>>>>>>
>>>>>
>>>>> Sorry for jumping in.
>>>>> Finally, I found this thread.
>>>>>
>>>>>
>>>>> My environment is completely different (ARM64 board),
>>>>> I am also suffering from a hibernation problem
>>>>> since this commit.
>>>>>
>>>>>
>>>>> I get no response on the serial console
>>>>> after "Restarting tasks ... done." log message.
>>>>>
>>>>>
>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>> irq function calls", I can get hibernation working again.
>>>>>
>>>>>
>>>>> SW info:
>>>>> defconfig: arch/arm64/configs/defconfig
>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>> PSCI : ARM Trusted Firmware
>>>>>
>>>>>
>>>>> SoC info:
>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>
>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>> save/restore the GIC context across suspend/resume. Is that something
>>>> you could check, assuming you have access to the firmware source code?
>>>
>>> Thanks for your comments.
>>>
>>>
>>> I do not know much about the manner of preserving GICv3 context.
>>>
>>> I can see this patch (rejected?) :
>>> https://patchwork.kernel.org/patch/9343061/
>>>
>>>
>>> Is it something that should be completely cared by firmware
>>> instead of kernel?
>>
>> That was definitely the intention, but it looks like something that ATF
>> has only started supporting very recently:
>>
>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>
>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>> is open source software, and I pushed my platform code to the upstream.
>>>
>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>
>>>
>>> I am not sure how ATF saves the context during hibernation, though.
>>
>> See the above link. Is there any chance of you trying this into your
>> firmware?
>>
>> Thanks,
>
> Thanks for the pointer.
>
>
> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>
> I think that will basically work for suspend-to-ram
> because all contexts including both non-secure and secure worlds will
> be retained in the main memory.
>
> However, I still do not understand how the context is preserved during
> the hibernation (suspend-to-disk).
>
>
> If my understanding is correct, hibernation on Linux works like follows:
>
> [1] Freeze all tasks
> [2] CPU_OFF for non-boot CPUs
> [3] Create a hibernation image
> [4] CPU_ON for non-boot CPUs
> [5] Write the hibernation image to the disk (=swap area)
> [6] SYSTEM_OFF
>
>
> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>
> If so, where and how does the firmware write the GIC-v3 context
> to the disk?

Gah, I completely missed the fact that you were talking about suspend to
disk, sorry about that.

It is likely that some driver doesn't restore its state properly. Is
there any chance that you could pinpoint which device creates the issue?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2017-08-08 01:30:40

by Masahiro Yamada

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi Marc,

2017-08-07 17:17 GMT+09:00 Marc Zyngier <[email protected]>:
> On 07/08/17 05:45, Masahiro Yamada wrote:
>> Hi Marc,
>>
>>
>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <[email protected]>:
>>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>>> Hi Marc,
>>>>
>>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>>>>> Hi Masahiro,
>>>>>
>>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>>> Hi.
>>>>>>
>>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>>
>>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>>
>>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>>
>>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>>> where it gets stuck.
>>>>>>>>
>>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>>
>>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>>> console? Or does it just stop at some point?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> tglx
>>>>>>>
>>>>>>
>>>>>> Sorry for jumping in.
>>>>>> Finally, I found this thread.
>>>>>>
>>>>>>
>>>>>> My environment is completely different (ARM64 board),
>>>>>> I am also suffering from a hibernation problem
>>>>>> since this commit.
>>>>>>
>>>>>>
>>>>>> I get no response on the serial console
>>>>>> after "Restarting tasks ... done." log message.
>>>>>>
>>>>>>
>>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>>> irq function calls", I can get hibernation working again.
>>>>>>
>>>>>>
>>>>>> SW info:
>>>>>> defconfig: arch/arm64/configs/defconfig
>>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>>> PSCI : ARM Trusted Firmware
>>>>>>
>>>>>>
>>>>>> SoC info:
>>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>>
>>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>>> save/restore the GIC context across suspend/resume. Is that something
>>>>> you could check, assuming you have access to the firmware source code?
>>>>
>>>> Thanks for your comments.
>>>>
>>>>
>>>> I do not know much about the manner of preserving GICv3 context.
>>>>
>>>> I can see this patch (rejected?) :
>>>> https://patchwork.kernel.org/patch/9343061/
>>>>
>>>>
>>>> Is it something that should be completely cared by firmware
>>>> instead of kernel?
>>>
>>> That was definitely the intention, but it looks like something that ATF
>>> has only started supporting very recently:
>>>
>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>
>>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>>> is open source software, and I pushed my platform code to the upstream.
>>>>
>>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>>
>>>>
>>>> I am not sure how ATF saves the context during hibernation, though.
>>>
>>> See the above link. Is there any chance of you trying this into your
>>> firmware?
>>>
>>> Thanks,
>>
>> Thanks for the pointer.
>>
>>
>> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>>
>> I think that will basically work for suspend-to-ram
>> because all contexts including both non-secure and secure worlds will
>> be retained in the main memory.
>>
>> However, I still do not understand how the context is preserved during
>> the hibernation (suspend-to-disk).
>>
>>
>> If my understanding is correct, hibernation on Linux works like follows:
>>
>> [1] Freeze all tasks
>> [2] CPU_OFF for non-boot CPUs
>> [3] Create a hibernation image
>> [4] CPU_ON for non-boot CPUs
>> [5] Write the hibernation image to the disk (=swap area)
>> [6] SYSTEM_OFF
>>
>>
>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>
>> If so, where and how does the firmware write the GIC-v3 context
>> to the disk?
>
> Gah, I completely missed the fact that you were talking about suspend to
> disk, sorry about that.
>
> It is likely that some driver doesn't restore its state properly. Is
> there any chance that you could pinpoint which device creates the issue?
>

I use eMMC to store the hibernation image, but
I do not think eMMC driver is the cause of the issue.

I guess the cause of the issue is GIC-v3 context is lost.


I am not an expert in this, so I will ask the ATF community
about how ATF can support suspend-to-disk.


--
Best Regards
Masahiro Yamada

2017-08-08 07:39:38

by Marc Zyngier

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

On 08/08/17 02:30, Masahiro Yamada wrote:
> Hi Marc,
>
> 2017-08-07 17:17 GMT+09:00 Marc Zyngier <[email protected]>:
>> On 07/08/17 05:45, Masahiro Yamada wrote:
>>> Hi Marc,
>>>
>>>
>>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <[email protected]>:
>>>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>>>> Hi Marc,
>>>>>
>>>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>>>>>> Hi Masahiro,
>>>>>>
>>>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>>>
>>>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>>>
>>>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>>>
>>>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>>>> where it gets stuck.
>>>>>>>>>
>>>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>>>
>>>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>>>> console? Or does it just stop at some point?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> tglx
>>>>>>>>
>>>>>>>
>>>>>>> Sorry for jumping in.
>>>>>>> Finally, I found this thread.
>>>>>>>
>>>>>>>
>>>>>>> My environment is completely different (ARM64 board),
>>>>>>> I am also suffering from a hibernation problem
>>>>>>> since this commit.
>>>>>>>
>>>>>>>
>>>>>>> I get no response on the serial console
>>>>>>> after "Restarting tasks ... done." log message.
>>>>>>>
>>>>>>>
>>>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>>>> irq function calls", I can get hibernation working again.
>>>>>>>
>>>>>>>
>>>>>>> SW info:
>>>>>>> defconfig: arch/arm64/configs/defconfig
>>>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>>>> PSCI : ARM Trusted Firmware
>>>>>>>
>>>>>>>
>>>>>>> SoC info:
>>>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>>>
>>>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>>>> save/restore the GIC context across suspend/resume. Is that something
>>>>>> you could check, assuming you have access to the firmware source code?
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>>
>>>>> I do not know much about the manner of preserving GICv3 context.
>>>>>
>>>>> I can see this patch (rejected?) :
>>>>> https://patchwork.kernel.org/patch/9343061/
>>>>>
>>>>>
>>>>> Is it something that should be completely cared by firmware
>>>>> instead of kernel?
>>>>
>>>> That was definitely the intention, but it looks like something that ATF
>>>> has only started supporting very recently:
>>>>
>>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>>
>>>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>>>> is open source software, and I pushed my platform code to the upstream.
>>>>>
>>>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>>>
>>>>>
>>>>> I am not sure how ATF saves the context during hibernation, though.
>>>>
>>>> See the above link. Is there any chance of you trying this into your
>>>> firmware?
>>>>
>>>> Thanks,
>>>
>>> Thanks for the pointer.
>>>
>>>
>>> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>>>
>>> I think that will basically work for suspend-to-ram
>>> because all contexts including both non-secure and secure worlds will
>>> be retained in the main memory.
>>>
>>> However, I still do not understand how the context is preserved during
>>> the hibernation (suspend-to-disk).
>>>
>>>
>>> If my understanding is correct, hibernation on Linux works like follows:
>>>
>>> [1] Freeze all tasks
>>> [2] CPU_OFF for non-boot CPUs
>>> [3] Create a hibernation image
>>> [4] CPU_ON for non-boot CPUs
>>> [5] Write the hibernation image to the disk (=swap area)
>>> [6] SYSTEM_OFF
>>>
>>>
>>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>>
>>> If so, where and how does the firmware write the GIC-v3 context
>>> to the disk?
>>
>> Gah, I completely missed the fact that you were talking about suspend to
>> disk, sorry about that.
>>
>> It is likely that some driver doesn't restore its state properly. Is
>> there any chance that you could pinpoint which device creates the issue?
>>
>
> I use eMMC to store the hibernation image, but
> I do not think eMMC driver is the cause of the issue.
>
> I guess the cause of the issue is GIC-v3 context is lost.

It is not lost. The boot kernel has re-initialized its state. What is
missing is that one driver in your system fails to restore its own state
correctly, and relies on doing something such as enabling/disabling the
interrupt in its PM handler, things will start working again (probably
because they use the same PM callbacks functions for both suspend/resume
and hibernation). This is in no way a guarantee.

Please try:

> swapon -a
> echo test_resume > /sys/power/disk
> echo disk > /sys/power/state

and let me know how this fares.

> I am not an expert in this, so I will ask the ATF community
> about how ATF can support suspend-to-disk.

As you pointed out, ARF is not involved at all in that context, so
that's pretty pointless.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2017-08-09 04:03:50

by Masahiro Yamada

[permalink] [raw]
Subject: Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

Hi Marc,


2017-08-08 16:39 GMT+09:00 Marc Zyngier <[email protected]>:
> On 08/08/17 02:30, Masahiro Yamada wrote:
>> Hi Marc,
>>
>> 2017-08-07 17:17 GMT+09:00 Marc Zyngier <[email protected]>:
>>> On 07/08/17 05:45, Masahiro Yamada wrote:
>>>> Hi Marc,
>>>>
>>>>
>>>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier <[email protected]>:
>>>>> On 03/08/17 13:52, Masahiro Yamada wrote:
>>>>>> Hi Marc,
>>>>>>
>>>>>> 2017-08-03 17:41 GMT+09:00 Marc Zyngier <[email protected]>:
>>>>>>> Hi Masahiro,
>>>>>>>
>>>>>>> On 03/08/17 08:32, Masahiro Yamada wrote:
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner <[email protected]>:
>>>>>>>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
>>>>>>>>>> On 31/07/17 18:06, Thomas Gleixner wrote:
>>>>>>>>>>> Can you please remove the patch. And try the following:
>>>>>>>>>>>
>>>>>>>>>>> # echo N > /sys/module/printk/parameters/console_suspend
>>>>>>>>>>>
>>>>>>>>>>> # echo mem > /sys/power/state
>>>>>>>>>>>
>>>>>>>>>>> and log the output of the serial console. That way we might get a clue
>>>>>>>>>>> where it gets stuck.
>>>>>>>>>>
>>>>>>>>>> I'm afraid it hangs right away. No response from SSH, no output to serial.
>>>>>>>>>
>>>>>>>>> What means hangs right away? Is there no output at all on the serial
>>>>>>>>> console? Or does it just stop at some point?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> tglx
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry for jumping in.
>>>>>>>> Finally, I found this thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> My environment is completely different (ARM64 board),
>>>>>>>> I am also suffering from a hibernation problem
>>>>>>>> since this commit.
>>>>>>>>
>>>>>>>>
>>>>>>>> I get no response on the serial console
>>>>>>>> after "Restarting tasks ... done." log message.
>>>>>>>>
>>>>>>>>
>>>>>>>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>>>>>>>> irq function calls", I can get hibernation working again.
>>>>>>>>
>>>>>>>>
>>>>>>>> SW info:
>>>>>>>> defconfig: arch/arm64/configs/defconfig
>>>>>>>> DT : arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>>>>>>>> PSCI : ARM Trusted Firmware
>>>>>>>>
>>>>>>>>
>>>>>>>> SoC info:
>>>>>>>> CPU : Cortex-A72 * 2 + Cortex-A53 * 2
>>>>>>>> irqchip : GICv3 (drivers/irq/irq-gic-v3.c)
>>>>>>>
>>>>>>> Let me take an educated guess: It feels like your firmware doesn't
>>>>>>> save/restore the GIC context across suspend/resume. Is that something
>>>>>>> you could check, assuming you have access to the firmware source code?
>>>>>>
>>>>>> Thanks for your comments.
>>>>>>
>>>>>>
>>>>>> I do not know much about the manner of preserving GICv3 context.
>>>>>>
>>>>>> I can see this patch (rejected?) :
>>>>>> https://patchwork.kernel.org/patch/9343061/
>>>>>>
>>>>>>
>>>>>> Is it something that should be completely cared by firmware
>>>>>> instead of kernel?
>>>>>
>>>>> That was definitely the intention, but it looks like something that ATF
>>>>> has only started supporting very recently:
>>>>>
>>>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>>>
>>>>>> ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
>>>>>> is open source software, and I pushed my platform code to the upstream.
>>>>>>
>>>>>> So, yes, I (and everybody) can have access to the firmware source code.
>>>>>>
>>>>>>
>>>>>> I am not sure how ATF saves the context during hibernation, though.
>>>>>
>>>>> See the above link. Is there any chance of you trying this into your
>>>>> firmware?
>>>>>
>>>>> Thanks,
>>>>
>>>> Thanks for the pointer.
>>>>
>>>>
>>>> Yes. I will try that once GIC-v3 context save/restore is supported in ATF.
>>>>
>>>> I think that will basically work for suspend-to-ram
>>>> because all contexts including both non-secure and secure worlds will
>>>> be retained in the main memory.
>>>>
>>>> However, I still do not understand how the context is preserved during
>>>> the hibernation (suspend-to-disk).
>>>>
>>>>
>>>> If my understanding is correct, hibernation on Linux works like follows:
>>>>
>>>> [1] Freeze all tasks
>>>> [2] CPU_OFF for non-boot CPUs
>>>> [3] Create a hibernation image
>>>> [4] CPU_ON for non-boot CPUs
>>>> [5] Write the hibernation image to the disk (=swap area)
>>>> [6] SYSTEM_OFF
>>>>
>>>>
>>>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>>>
>>>> If so, where and how does the firmware write the GIC-v3 context
>>>> to the disk?
>>>
>>> Gah, I completely missed the fact that you were talking about suspend to
>>> disk, sorry about that.
>>>
>>> It is likely that some driver doesn't restore its state properly. Is
>>> there any chance that you could pinpoint which device creates the issue?
>>>
>>
>> I use eMMC to store the hibernation image, but
>> I do not think eMMC driver is the cause of the issue.
>>
>> I guess the cause of the issue is GIC-v3 context is lost.
>
> It is not lost. The boot kernel has re-initialized its state. What is
> missing is that one driver in your system fails to restore its own state
> correctly, and relies on doing something such as enabling/disabling the
> interrupt in its PM handler, things will start working again (probably
> because they use the same PM callbacks functions for both suspend/resume
> and hibernation). This is in no way a guarantee.
>
> Please try:
>
>> swapon -a
>> echo test_resume > /sys/power/disk
>> echo disk > /sys/power/state
>
> and let me know how this fares.
>
>> I am not an expert in this, so I will ask the ATF community
>> about how ATF can support suspend-to-disk.
>
> As you pointed out, ARF is not involved at all in that context, so
> that's pretty pointless.
>



I tried it based on Linus' tree.



Commit: bfa738cf3dfa (Today's Linus tree HEAD)
defconfig: arch/arm64/configs/defconfig
DT: arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts


I am using eMMC (/dev/mmcblk0) as a disk.

The eMMC driver is in upstream (drivers/mmc/host/sdhci-cadence.c)



"test_resume" seems to work.


The full log is as follows:





Starting kernel ...

WARNING: BL31: cortex_a53: errata workaround for 835769 was missing!
WARNING: BL31: cortex_a53: errata workaround for 843419 was missing!
INFO: BL31: cortex_a53: errata workaround for 855873 was applied
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.13.0-rc4-00052-gbfa738cf3dfa
(masahiro@pug) (gcc version 6.1.1 20160711 (Linaro GCC 6.1-2016.08))
#445 SMP PREEMPT Wed Aug 9 12:48:33 JST 2017
[ 0.000000] Boot CPU: AArch64 Processor [410fd082]
[ 0.000000] Machine model: UniPhier LD20 Reference Board
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x00000000fec00000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000013fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x13ffe2400-0x13ffe3eff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x000000013fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000013fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: MIGRATE_INFO_TYPE not supported.
[ 0.000000] percpu: Embedded 26 pages/cpu @ffff8000bff78000 s67352
r8192 d30952 u106496
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 774144
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: resume=/dev/mmcblk0 no_console_suspend
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] software IO TLB [mem 0xfac00000-0xfec00000] (64MB)
mapped at [ffff80007ac00000-ffff80007ebfffff]
[ 0.000000] Memory: 2963680K/3145728K available (9212K kernel code,
1130K rwdata, 4200K rodata, 1088K init, 400K bss, 165664K reserved,
16384K cma-reserved)
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000
( 128 MB)
[ 0.000000] vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000
(129022 GB)
[ 0.000000] .text : 0xffff000008080000 - 0xffff000008980000
( 9216 KB)
[ 0.000000] .rodata : 0xffff000008980000 - 0xffff000008da0000
( 4224 KB)
[ 0.000000] .init : 0xffff000008da0000 - 0xffff000008eb0000
( 1088 KB)
[ 0.000000] .data : 0xffff000008eb0000 - 0xffff000008fcaa00
( 1131 KB)
[ 0.000000] .bss : 0xffff000008fcaa00 - 0xffff00000902ece0
( 401 KB)
[ 0.000000] fixed : 0xffff7dfffe7fd000 - 0xffff7dfffec00000
( 4108 KB)
[ 0.000000] PCI I/O : 0xffff7dfffee00000 - 0xffff7dffffe00000
( 16 MB)
[ 0.000000] vmemmap : 0xffff7e0000000000 - 0xffff800000000000
( 2048 GB maximum)
[ 0.000000] 0xffff7e0000000000 - 0xffff7e0003000000
( 48 MB actual)
[ 0.000000] memory : 0xffff800000000000 - 0xffff8000c0000000
( 3072 MB)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=4.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000005fe80000
[ 0.000000] arch_timer: cp15 timer(s) running at 50.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff
max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
[ 0.000003] sched_clock: 56 bits at 50MHz, resolution 20ns, wraps
every 4398046511100ns
[ 0.000131] Console: colour dummy device 80x25
[ 0.000587] console [tty0] enabled
[ 0.000628] Calibrating delay loop (skipped), value calculated
using timer frequency.. 100.00 BogoMIPS (lpj=200000)
[ 0.000651] pid_max: default: 32768 minimum: 301
[ 0.000708] Security Framework initialized
[ 0.003117] Dentry cache hash table entries: 524288 (order: 10,
4194304 bytes)
[ 0.004482] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.004561] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.004617] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.020049] ASID allocator initialised with 65536 entries
[ 0.028039] Hierarchical SRCU implementation.
[ 0.036324] EFI services will not be available.
[ 0.044056] smp: Bringing up secondary CPUs ...
[ 0.072170] Detected PIPT I-cache on CPU1
[ 0.072196] GICv3: CPU1: found redistributor 1 region 0:0x000000005fea0000
[ 0.072217] CPU1: Booted secondary processor [410fd082]
[ 0.118125] CPU features: enabling workaround for ARM erratum 845719
[ 0.118132] Detected VIPT I-cache on CPU2
[ 0.118155] GICv3: CPU2: found redistributor 100 region 0:0x000000005fec0000
[ 0.118178] CPU2: Booted secondary processor [410fd034]
[ 0.144235] Detected VIPT I-cache on CPU3
[ 0.144251] GICv3: CPU3: found redistributor 101 region 0:0x000000005fee0000
[ 0.144264] CPU3: Booted secondary processor [410fd034]
[ 0.144338] smp: Brought up 1 node, 4 CPUs
[ 0.144437] SMP: Total of 4 processors activated.
[ 0.144449] CPU features: detected feature: GIC system register CPU interface
[ 0.144461] CPU features: detected feature: 32-bit EL0 Support
[ 0.144639] CPU: All CPU(s) started at EL1
[ 0.144667] alternatives: patching kernel code
[ 0.145650] devtmpfs: initialized
[ 0.146949] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.147003] futex hash table entries: 1024 (order: 5, 131072 bytes)
[ 0.147364] pinctrl core: initialized pinctrl subsystem
[ 0.147896] DMI not present or invalid.
[ 0.148088] random: get_random_u32 called from
bucket_table_alloc+0xf8/0x280 with crng_init=0
[ 0.148214] NET: Registered protocol family 16
[ 0.148830] random: fast init done
[ 0.152312] cpuidle: using governor menu
[ 0.152645] vdso: 2 pages (1 code @ ffff000008986000, 1 data @
ffff000008eb5000)
[ 0.152680] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[ 0.153493] DMA: preallocated 256 KiB pool for atomic allocations
[ 0.153823] Serial: AMBA PL011 UART driver
[ 0.164789] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.174604] ACPI: Interpreter disabled.
[ 0.175163] vgaarb: loaded
[ 0.175378] SCSI subsystem initialized
[ 0.176629] usbcore: registered new interface driver usbfs
[ 0.176686] usbcore: registered new interface driver hub
[ 0.176762] usbcore: registered new device driver usb
[ 0.177205] pps_core: LinuxPPS API ver. 1 registered
[ 0.177223] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <[email protected]>
[ 0.177266] PTP clock support registered
[ 0.177370] EDAC MC: Ver: 3.0.0
[ 0.177638] dmi: Firmware registration failed.
[ 0.177822] Advanced Linux Sound Architecture Driver Initialized.
[ 0.178276] clocksource: Switched to clocksource arch_sys_counter
[ 0.178375] VFS: Disk quotas dquot_6.6.0
[ 0.178421] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.178554] pnp: PnP ACPI: disabled
[ 0.182345] NET: Registered protocol family 2
[ 0.182740] TCP established hash table entries: 32768 (order: 6,
262144 bytes)
[ 0.182937] TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
[ 0.183486] TCP: Hash tables configured (established 32768 bind 32768)
[ 0.183644] UDP hash table entries: 2048 (order: 4, 65536 bytes)
[ 0.183692] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
[ 0.183818] NET: Registered protocol family 1
[ 0.184138] RPC: Registered named UNIX socket transport module.
[ 0.184163] RPC: Registered udp transport module.
[ 0.184178] RPC: Registered tcp transport module.
[ 0.184192] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 0.184389] Unpacking initramfs...
[ 0.240364] Freeing initrd memory: 1108K
[ 0.240832] kvm [1]: HYP mode not available
[ 0.241653] audit: initializing netlink subsys (disabled)
[ 0.241824] audit: type=2000 audit(0.239:1): state=initialized
audit_enabled=0 res=1
[ 0.242045] workingset: timestamp_bits=44 max_order=20 bucket_order=0
[ 0.247990] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.248572] NFS: Registering the id_resolver key type
[ 0.248612] Key type id_resolver registered
[ 0.248628] Key type id_legacy registered
[ 0.248649] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[ 0.248812] 9p: Installing v9fs 9p2000 file system support
[ 0.250783] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 247)
[ 0.250815] io scheduler noop registered
[ 0.250988] io scheduler cfq registered (default)
[ 0.251005] io scheduler mq-deadline registered
[ 0.251021] io scheduler kyber registered
[ 0.251698] uniphier-system-bus 58c00000.system-bus: could not find
pctldev for node /soc@0/soc-glue@5f800000/pinctrl/system_bus_grp,
deferring probe
[ 0.261491] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.263260] 54006800.serial: ttyS0 at MMIO 0x54006800 (irq = 6,
base_baud = 3676470) is a 16550A
[ 1.091069] console [ttyS0] enabled
[ 1.095058] SuperH (H)SCI(F) driver initialized
[ 1.099835] msm_serial: driver initialized
[ 1.104419] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 1.115648] loop: module loaded
[ 1.119515] hisi_sas: driver version v1.6
[ 1.125404] libphy: Fixed MDIO Bus: probed
[ 1.129973] tun: Universal TUN/TAP device driver, 1.6
[ 1.135741] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 1.141610] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 1.147598] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[ 1.154592] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 1.160222] igbvf: Intel(R) Gigabit Virtual Function Network Driver
- version 2.4.0-k
[ 1.168090] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 1.174225] sky2: driver version 1.30
[ 1.178248] VFIO - User Level meta-driver version: 0.3
[ 1.184273] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 1.190836] ehci-pci: EHCI PCI platform driver
[ 1.195329] ehci-platform: EHCI generic platform driver
[ 1.200642] ehci-orion: EHCI orion driver
[ 1.204724] ehci-exynos: EHCI EXYNOS driver
[ 1.208981] ehci-msm: Qualcomm On-Chip EHCI Host Controller
[ 1.214636] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 1.220887] ohci-pci: OHCI PCI platform driver
[ 1.225387] ohci-platform: OHCI generic platform driver
[ 1.230697] ohci-exynos: OHCI EXYNOS driver
[ 1.235177] usbcore: registered new interface driver usb-storage
[ 1.242531] i2c /dev entries driver
[ 1.250270] sdhci: Secure Digital Host Controller Interface driver
[ 1.256518] sdhci: Copyright(c) Pierre Ossman
[ 1.261074] Synopsys Designware Multimedia Card Interface Driver
[ 1.267620] sdhci-pltfm: SDHCI platform and OF driver helper
[ 1.318318] mmc0: SDHCI controller on 5a000000.sdhc [5a000000.sdhc]
using ADMA 64-bit
[ 1.330068] ledtrig-cpu: registered to indicate activity on CPUs
[ 1.336862] usbcore: registered new interface driver usbhid
[ 1.342482] usbhid: USB HID core driver
[ 1.348120] NET: Registered protocol family 17
[ 1.352689] 9pnet: Installing 9P2000 support
[ 1.357074] Key type dns_resolver registered
[ 1.362787] registered taskstats version 1
[ 1.370774] smsc911x 43f00000.ethernet: 43f00000.ethernet supply
vdd33a not found, using dummy regulator
[ 1.380376] smsc911x 43f00000.ethernet: 43f00000.ethernet supply
vddvario not found, using dummy regulator
[ 1.399995] mmc0: new HS200 MMC card at address 0001
[ 1.405310] mmcblk0: mmc0:0001 004G60 3.69 GiB
[ 1.409971] mmcblk0boot0: mmc0:0001 004G60 partition 1 2.00 MiB
[ 1.412556] libphy: smsc911x-mdio: probed
[ 1.412947] smsc911x 43f00000.ethernet eth0: MAC Address: 6a:fe:ad:31:b1:51
[ 1.414166] 43fb0000.uart: ttyS1 at MMIO 0x43fb0000 (irq = 0,
base_baud = 768000) is a 16550A
[ 1.414785] hctosys: unable to open rtc device (rtc0)
[ 1.440767] mmcblk0boot1: mmc0:0001 004G60 partition 2 2.00 MiB
[ 1.446817] mmcblk0rpmb: mmc0:0001 004G60 partition 3 512 KiB
[ 1.454926] ALSA device list:
[ 1.457909] No soundcards found.
[ 1.462026] Freeing unused kernel memory: 1088K
Starting logging: OK
Initializing random number generator... done.
Starting network: OK

Welcome to Buildroot
buildroot login: root
# mkswap /dev/mmcblk0
Setting up swapspace version 1, size = 3959418880 bytes
# swapon /dev/mmcblk0
[ 36.214903] Adding 3866620k swap on /dev/mmcblk0. Priority:-1
extents:1 across:3866620k SS
# echo test_resume > /sys/power/disk
# cat /sys/power/disk
shutdown reboot suspend [test_resume]
# echo disk > /sys/power/state
[ 71.137589] PM: Syncing filesystems ...
[ 71.141579] PM: done.
[ 71.143871] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 71.151948] OOM killer disabled.
[ 71.155295] PM: Preallocating image memory... done (allocated 47630 pages)
[ 71.360608] PM: Allocated 190520 kbytes in 0.19 seconds (1002.73 MB/s)
[ 71.367149] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 71.391858] Disabling non-boot CPUs ...
[ 71.414571] CPU1: shutdown
[ 71.417281] psci: CPU1 killed.
[ 71.442560] CPU2: shutdown
[ 71.445271] psci: CPU2 killed.
[ 71.470963] CPU3: shutdown
[ 71.473674] psci: CPU3 killed.
[ 71.477083] PM: Creating hibernation image:
[ 71.477083] PM: Need to copy 46899 pages
[ 71.477083] PM: Hibernation image created (46899 pages copied)
[ 71.477083] Suspended for 0.569 seconds
[ 71.477110] Enabling non-boot CPUs ...
[ 71.484903] Detected PIPT I-cache on CPU1
[ 71.488941] GICv3: CPU1: found redistributor 1 region 0:0x000000005fea0000
[ 71.495853] CPU1: Booted secondary processor [410fd082]
[ 71.501314] CPU1 is up
[ 71.503769] Detected VIPT I-cache on CPU2
[ 71.507802] GICv3: CPU2: found redistributor 100 region 0:0x000000005fec0000
[ 71.514882] CPU2: Booted secondary processor [410fd034]
[ 71.520791] CPU2 is up
[ 71.523242] Detected VIPT I-cache on CPU3
[ 71.527266] GICv3: CPU3: found redistributor 101 region 0:0x000000005fee0000
[ 71.534334] CPU3: Booted secondary processor [410fd034]
[ 71.539807] CPU3 is up
[ 71.543493] hibernate: Hibernating on CPU 0 [mpidr:0x0]
[ 71.621511] PM: Using 3 thread(s) for compression.
[ 71.621511] PM: Compressing and saving image data (46991 pages)...
[ 71.632559] PM: Image saving progress: 0%
[ 71.729828] PM: Image saving progress: 10%
[ 71.844900] PM: Image saving progress: 20%
[ 71.958230] PM: Image saving progress: 30%
[ 72.036647] PM: Image saving progress: 40%
[ 72.099759] PM: Image saving progress: 50%
[ 72.160172] PM: Image saving progress: 60%
[ 72.233590] PM: Image saving progress: 70%
[ 72.299996] PM: Image saving progress: 80%
[ 72.369865] PM: Image saving progress: 90%
[ 72.454932] PM: Image saving progress: 100%
[ 74.105739] PM: Image saving done.
[ 74.109213] PM: Wrote 187964 kbytes in 2.47 seconds (76.09 MB/s)
[ 74.115637] PM: S
[ 74.116848] |
[ 74.182660] PM: Using 3 thread(s) for decompression.
[ 74.182660] PM: Loading and decompressing image data (46991 pages)...
[ 74.194105] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[ 74.416966] PM: Image loading progress: 0%
[ 74.667431] PM: Image loading progress: 10%
[ 74.768269] PM: Image loading progress: 20%
[ 74.863927] PM: Image loading progress: 30%
[ 74.957324] PM: Image loading progress: 40%
[ 75.050884] PM: Image loading progress: 50%
[ 75.144371] PM: Image loading progress: 60%
[ 75.230387] PM: Image loading progress: 70%
[ 75.313445] PM: Image loading progress: 80%
[ 75.403118] PM: Image loading progress: 90%
[ 75.494369] PM: Image loading progress: 100%
[ 75.499042] PM: Image loading done.
[ 75.502549] PM: Read 187964 kbytes in 1.30 seconds (144.58 MB/s)
[ 75.527824] Disabling non-boot CPUs ...
[ 75.558505] CPU1: shutdown
[ 75.561219] psci: CPU1 killed.
[ 75.586483] CPU2: shutdown
[ 75.589193] psci: CPU2 killed.
[ 75.614940] CPU3: shutdown
[ 75.617654] psci: CPU3 killed.
[ 71.477083] Suspended for 9.134 seconds
[ 71.477111] Enabling non-boot CPUs ...
[ 71.484909] Detected PIPT I-cache on CPU1
[ 71.488946] GICv3: CPU1: found redistributor 1 region 0:0x000000005fea0000
[ 71.495857] CPU1: Booted secondary processor [410fd082]
[ 71.501316] CPU1 is up
[ 71.503777] Detected VIPT I-cache on CPU2
[ 71.507817] GICv3: CPU2: found redistributor 100 region 0:0x000000005fec0000
[ 71.514895] CPU2: Booted secondary processor [410fd034]
[ 71.520841] CPU2 is up
[ 71.523293] Detected VIPT I-cache on CPU3
[ 71.527315] GICv3: CPU3: found redistributor 101 region 0:0x000000005fee0000
[ 71.534384] CPU3: Booted secondary processor [410fd034]
[ 71.539852] CPU3 is up
[ 71.596208] OOM killer enabled.
[ 71.599374] Restarting tasks ... done.
# ls
# uname -r
4.13.0-rc4-00052-gbfa738cf3dfa









--
Best Regards
Masahiro Yamada