2014-10-07 23:32:04

by Wilmer van der Gaast

[permalink] [raw]
Subject: Machine crashes right *after* ~successful resume

Hello,

Rafael, including you on this since
http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
mentions you as the maintainer for Linux + power management. I hope this
is still accurate.

Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
machine (Intel Z68, i7-3770K) that are somewhat less obvious.

After every boot, I get two successful suspend+resume cycles, but after
the third suspend, it won't resume successfully. On the VGA console I've
never had anything useful logged, luckily over the serial console I've
had more luck. I seem to get as far as:

[ 153.787678] PM: resume of devices complete after 3797.737 msecs
[ 153.787775] PM: resume devices took 3.796 seconds
[ 154.238612] Restarting tasks ... done.

And indeed, while testing I was running a "ping -i0.01" to a host on my
network, and it managed to get a few packets out. Timing already seems
quite off though:

22:11:49.515489 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 894, length 64
22:11:49.982265 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 895, length 64
22:11:50.986779 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 896, length 64

Note the gaps that are 0.4-1.0s instead of the 0.01s they should've
been. To me these pings going *out* sound like userland's definitely
waking up for a while, or at least some processes are. Also, for several
seconds even during earlier stages of the resume, the machine is already
responding to echo requests.

Sadly after this message to my serial console and these few ICMP
packets, the machine locks up quite hard, to the point that SysRq
doesn't respond anymore. :-(

This is happening for a while already and makes suspend+resume mostly
useless on my machine. What other debugging info can I provide to help
with getting this fixed?

I've found out about pm_trace, which always points at the same line (and
no device):

/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780503] Magic
number: 0:52:740
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780599] hash
matches /tmp/linux-3.16.3/drivers/base/power/main.c:812

In my source tree that line is:

TRACE_RESUME(error);

Right at the end of device_resume(), under the Complete: label. Note
that I might have to redo this though, as I now realise I had only
recompiled my *kernel* with the PM_TRACE_RTC flag set, not all my
modules, which I assume is not enough. (I'm thinking of filing a Debian
bug requesting this flag to be enabled by default..) However since the
kernel seems to declare the resume as complete I'm not sure whether
pm_trace is still of any use?

With kernels 3.10 and older I have no such problems, I can
suspend+resume as often as I want.

I've already tried to skip the NVidia + VMware modules at boot time (as
you can see from the logs they're not loaded at any point), but it
didn't help. I could try omitting more modules.

I'm attaching a full dmesg of boot + a few suspend+resume cycles in 3.10
and 3.16, and a dump of the serial console showing the last resume cycle
(which I couldn't get from dmesg of course).

You might notice the message about s2ram segfaulting which I've looked
at, that seems to be VBE-related code, but this problem occurs even when
I just echo ram to /sys/power/state directly without using s2ram, so I
assume it's not related.

Sorry for the long message. I'd love some ideas for troubleshooting an
issue like this.

"Attachments" in http://roy.gaast.net/~wilmer/.lkml/ since I just
realised >200KB of attachments might not be appreciated. :-)


Cheers,

Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +


2014-10-12 14:30:31

by Pavel Machek

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hi!

> Rafael, including you on this since http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
> mentions you as the maintainer for Linux + power management. I hope this is
> still accurate.
>
> Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
> 3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
> machine (Intel Z68, i7-3770K) that are somewhat less obvious.
>
> After every boot, I get two successful suspend+resume cycles, but after the
> third suspend, it won't resume successfully. On the VGA console I've never
> had anything useful logged, luckily over the serial console I've had more
> luck. I seem to get as far as:

Has it ever worked ok? ...aha, in 3.10, ok.

> I've found out about pm_trace, which always points at the same line (and no
> device):
>
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780503] Magic
> number: 0:52:740
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780599] hash matches
> /tmp/linux-3.16.3/drivers/base/power/main.c:812
>
> In my source tree that line is:
>
> TRACE_RESUME(error);


if it resumes ok, this kind of tracking will not help.

> With kernels 3.10 and older I have no such problems, I can suspend+resume as
> often as I want.

is there chance to bisect?

> I've already tried to skip the NVidia + VMware modules at boot time (as you
> can see from the logs they're not loaded at any point), but it didn't help.
> I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-12 15:49:28

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

Many thanks for your response!

On 12-10-14 15:30, Pavel Machek wrote:
>
> Has it ever worked ok? ...aha, in 3.10, ok.
>
Correct. And I've tried a few more kernels now, compiled on my own. 3.17
still has this issue, 3.10 is completely fine all the way up to 3.10.57
(I've tested just under 50 cycles last night). 3.11 I tried but it seems
to have other suspend-resume stability issues not present anymore in
later kernels, I've mostly not used those results.

git bisect: I've finally succeeded! I've tried automating it completely,
but sadly Gigabyte couldn't be bothered wiring up the motherboard to
make the watchdog work. :-(

The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb

Merge: 07f2daa fed2451
Author: Bjorn Helgaas <[email protected]>
Date: Wed Aug 28 20:55:41 2013 -0600

Merge branch 'pci/misc' into next

* pci/misc:
PCI: Remove pcie_cap_has_devctl()
PCI: Support PCIe Capability Slot registers only for ports with slots
PCI: Remove PCIe Capability version checks
PCI: Allow PCIe Capability link-related register access for switches
PCI: Add offsets of PCIe capability registers
PCI: Tidy bitmasks and spacing of PCIe capability definitions
PCI: Remove obsolete comment reference to pci_pcie_cap2()
PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
PCI: Rename PCIe capability definitions to follow convention
PCI: Disable decoding for BAR sizing only when it was actually
enabled
PCI: Add comment about needing pci_msi_off() even when
CONFIG_PCI_MSI=n
PCI: Add pcibios_pm_ops for optional arch-specific hibernate
functionality

I've then tried to narrow down which of the merged changes is my issue
but with no luck, possibly because there's a problem with a combination
of one of these changes, and a change that was not in the pci/misc
branch at the time. I could do a manual test instead.

>> I've already tried to skip the NVidia + VMware modules at boot time (as you
>> can see from the logs they're not loaded at any point), but it didn't help.
>> I could try omitting more modules.
> Yes, try with minimal modules (and no s2ram) would be nice.
>
I've tried unloading a bunch of modules (sound and NIC IIRC), same
results. I can try this again with an even more minimal set. If this
improves the situation, I'll post again.


Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-12 20:40:40

by Pavel Machek

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Bjorn, any ideas?

Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

Thanks,
Pavel

On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> Hello,
>
> Many thanks for your response!
>
> On 12-10-14 15:30, Pavel Machek wrote:
> >
> >Has it ever worked ok? ...aha, in 3.10, ok.
> >
> Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> still has this issue, 3.10 is completely fine all the way up to 3.10.57
> (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> have other suspend-resume stability issues not present anymore in later
> kernels, I've mostly not used those results.
>
> git bisect: I've finally succeeded! I've tried automating it completely, but
> sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> watchdog work. :-(
>
> The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
>
> Merge: 07f2daa fed2451
> Author: Bjorn Helgaas <[email protected]>
> Date: Wed Aug 28 20:55:41 2013 -0600
>
> Merge branch 'pci/misc' into next
>
> * pci/misc:
> PCI: Remove pcie_cap_has_devctl()
> PCI: Support PCIe Capability Slot registers only for ports with slots
> PCI: Remove PCIe Capability version checks
> PCI: Allow PCIe Capability link-related register access for switches
> PCI: Add offsets of PCIe capability registers
> PCI: Tidy bitmasks and spacing of PCIe capability definitions
> PCI: Remove obsolete comment reference to pci_pcie_cap2()
> PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> PCI: Rename PCIe capability definitions to follow convention
> PCI: Disable decoding for BAR sizing only when it was actually enabled
> PCI: Add comment about needing pci_msi_off() even when
> CONFIG_PCI_MSI=n
> PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> functionality
>
> I've then tried to narrow down which of the merged changes is my issue but
> with no luck, possibly because there's a problem with a combination of one
> of these changes, and a change that was not in the pci/misc branch at the
> time. I could do a manual test instead.
>
> >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> >>can see from the logs they're not loaded at any point), but it didn't help.
> >>I could try omitting more modules.
> >Yes, try with minimal modules (and no s2ram) would be nice.
> >
> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> I can try this again with an even more minimal set. If this improves the
> situation, I'll post again.
>
>
> Wilmer van der Gaast.
>

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-12 23:47:35

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On 12-10-14 21:40, Pavel Machek wrote:
> Bjorn, any ideas?
>
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>
I've tried this, too many conflicts unfortunately.

Just noticed this message appear during failing resumes by the way:

[ 54.203072] Clocksource tsc unstable (delta = -499956111 ns)
[ 54.203151] Switched to clocksource hpet
[ 54.203166] PM: resume of devices complete after 2142.341 msecs

Though not all the time. Feels like it's more another symptom of the
same problem. In my original e-mail I already noted timing strangeness,
with a 0.01s ping interval growing to 0.4s+.

Anyway, my previous bisect result appears to be wrong. :-( I've done
another bisect on a narrow range around it, now
928bea964827d7824b548c1f8e06eccbbc4d0d7d is considered guilty. I've
rerun the test twice with that revision and the one before it
(55ed83a615730c2578da155bc99b68f4417ffe20), and the result seems
consistent now; 928bea gets me just two clean suspend+resumes, 55ed83 more.

I have tried to revert this change in a 3.17 tree but it didn't apply
cleanly. One issue was a "Unreversed patch detected!" which looks to me
like some of this work has been changed already. Even against a 3.12
tree I get this issue.

Just to be sure, I've tried ignoring the unreversed patch warning and
tweaked the patch in two more places to make it apply, but indeed that
does not solve my problem.

A Google search for the revision number shows that there has been quite
a discussion about it already. Maybe my machine has found another issue
(though I suppose my machine's more guilty than the kernel! :-/).

>> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
>> I can try this again with an even more minimal set. If this improves the
>> situation, I'll post again.
>>
This is done: Still seeing the same issue. (And I'm using raw echo
mem>/proc/... for all testing now.) Same for a "make defconfig" kernel.


Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-13 14:46:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Sunday, October 12, 2014 10:40:32 PM Pavel Machek wrote:
> Bjorn, any ideas?
>
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

That's a merge, isn't it?

I'd rather check what the pci/misc branch was based on and then bisect that
branch.

If you do

$ git show fed2451

you'll see (among other things) that this indeed is the PCI branch merged
by that commit and that it is based on

3b2f64d00c46 Linux 3.11-rc2

So, you can do

$ git bisect 3b2f64d00c46..fed2451

and see which of the commits in there introduced the problem you're seeing.

Note: Test fed2451 itself *first* and if that is bad already, then the merge
itself was problematic, in which case please let me know.


> On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> > Hello,
> >
> > Many thanks for your response!
> >
> > On 12-10-14 15:30, Pavel Machek wrote:
> > >
> > >Has it ever worked ok? ...aha, in 3.10, ok.
> > >
> > Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> > still has this issue, 3.10 is completely fine all the way up to 3.10.57
> > (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> > have other suspend-resume stability issues not present anymore in later
> > kernels, I've mostly not used those results.
> >
> > git bisect: I've finally succeeded! I've tried automating it completely, but
> > sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> > watchdog work. :-(
> >
> > The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> >
> > Merge: 07f2daa fed2451
> > Author: Bjorn Helgaas <[email protected]>
> > Date: Wed Aug 28 20:55:41 2013 -0600
> >
> > Merge branch 'pci/misc' into next
> >
> > * pci/misc:
> > PCI: Remove pcie_cap_has_devctl()
> > PCI: Support PCIe Capability Slot registers only for ports with slots
> > PCI: Remove PCIe Capability version checks
> > PCI: Allow PCIe Capability link-related register access for switches
> > PCI: Add offsets of PCIe capability registers
> > PCI: Tidy bitmasks and spacing of PCIe capability definitions
> > PCI: Remove obsolete comment reference to pci_pcie_cap2()
> > PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> > PCI: Rename PCIe capability definitions to follow convention
> > PCI: Disable decoding for BAR sizing only when it was actually enabled
> > PCI: Add comment about needing pci_msi_off() even when
> > CONFIG_PCI_MSI=n
> > PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> > functionality
> >
> > I've then tried to narrow down which of the merged changes is my issue but
> > with no luck, possibly because there's a problem with a combination of one
> > of these changes, and a change that was not in the pci/misc branch at the
> > time. I could do a manual test instead.
> >
> > >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> > >>can see from the logs they're not loaded at any point), but it didn't help.
> > >>I could try omitting more modules.
> > >Yes, try with minimal modules (and no s2ram) would be nice.
> > >
> > I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> > I can try this again with an even more minimal set. If this improves the
> > situation, I'll post again.
> >
> >
> > Wilmer van der Gaast.
> >
>
>

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-10-15 11:16:50

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello Rafael,

Rafael J. Wysocki ([email protected]) wrote:
> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
> That's a merge, isn't it?
>
Correct, it was, and I did try to figure out which of its parents was
the guilty one, but then I found out the real problem is
928bea964827d7824b548c1f8e06eccbbc4d0d7d.

Not sure why 2e8b... was initially found guilty by git bisect, I fear
that my testing was not thorough enough. I've verified a couple of times
now that 928bea96... does cause crashes and the previous revision does not.

928bea... seems to reshuffle PCI initialisation a little bit and has
caused more troubles, judging from a Google query for it. Some changes
were made already as a result, and this unfortunately makes a revert on
a later kernel tree (to see if that fixes the problem for me) much less
straight-forward. :-(

I can look at the code and see how to revert this now, but I'm
definitely not very proficient outside userland.


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +


Attachments:
(No filename) (1.24 kB)
signature.asc (173.00 B)
Digital signature
Download all attachments

2014-10-15 13:58:42

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

[+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
until they're needed")]

On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <[email protected]> wrote:
> Hello Rafael,
>
> Rafael J. Wysocki ([email protected]) wrote:
>> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>> That's a merge, isn't it?
>>
> Correct, it was, and I did try to figure out which of its parents was
> the guilty one, but then I found out the real problem is
> 928bea964827d7824b548c1f8e06eccbbc4d0d7d.
>
> Not sure why 2e8b... was initially found guilty by git bisect, I fear
> that my testing was not thorough enough. I've verified a couple of times
> now that 928bea96... does cause crashes and the previous revision does not.
>
> 928bea... seems to reshuffle PCI initialisation a little bit and has
> caused more troubles, judging from a Google query for it. Some changes
> were made already as a result, and this unfortunately makes a revert on
> a later kernel tree (to see if that fixes the problem for me) much less
> straight-forward. :-(

More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Can you open a report at http://bugzilla.kernel.org, please? Please
also attach the complete "lspci -vv" output.

Bjorn

2014-10-15 18:39:19

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Wed, Oct 15, 2014 at 6:58 AM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
> until they're needed")]
>
> On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast <[email protected]>
>> Not sure why 2e8b... was initially found guilty by git bisect, I fear
>> that my testing was not thorough enough. I've verified a couple of times
>> now that 928bea96... does cause crashes and the previous revision does not.

so third resume will not work? that is strange.
second and third should not use same code path...

>>
>> 928bea... seems to reshuffle PCI initialisation a little bit and has
>> caused more troubles, judging from a Google query for it. Some changes
>> were made already as a result, and this unfortunately makes a revert on
>> a later kernel tree (to see if that fixes the problem for me) much less
>> straight-forward. :-(
>
> More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Please check if attached reverting patch would work on 3.17.

Yinghai


Attachments:
revert_928bea9_from_3.17.patch (7.02 kB)

2014-10-15 23:35:40

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello Yinghai,

On 15-10-14 19:39, Yinghai Lu wrote:
>
> so third resume will not work? that is strange.
> second and third should not use same code path...
>
Always exactly the third time, yes. Seems strange indeed. :-( I was
under the impression that on each resume, completion time of device
resumes was growing, and wondered whether that could be related. However
looking back at my logs, this is not consistent, in some cases the time
is constant.

Anyway, your patch works! Had to tweak it slightly to apply cleanly to
the 3.17 tarball I have, but my machine now went through eleven
successful suspend+resume cycles again.

Is there anything I can do now to find out why your change is causing my
machine to crash?

Thank you!


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-16 04:32:39

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Wed, Oct 15, 2014 at 4:34 PM, Wilmer van der Gaast <[email protected]> wrote:
>
> Is there anything I can do now to find out why your change is causing my
> machine to crash?

Can you please try attached patch? that should workaround the problem.

as some driver is using pci_enable_device in .resume instead of
pci_renable_device....

We should skip the pci_enable_bridge in those pci_enable_device to avoid
contention between async device_resume.

Thanks

Yinghai


Attachments:
skip_enable_bridge_on_resume_path.patch (1.06 kB)

2014-10-16 09:36:10

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 16-10-14 05:32, Yinghai Lu wrote:
>
> Can you please try attached patch? that should workaround the problem.
>
Sadly, no luck. (I do assume you meant me to use the patch against a
clean 3.17 tree *without* yesterday's revert patch applied.) Back to a
crash at/after the third resume:

[ 372.502897] usb 3-1.1: reset high-speed USB device number 3 using
ehci-pci
[ 372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
[ 373.398437] Clocksource tsc unstable (delta = -136457848 ns)
[ 373.897503] Switched to clocksource hpet
[ 373.897536] PM: resume of devices complete after 2143.535 msecs
[ 373.898225] r8169 0000:07:00.0 eth0: link up
[ 374.319311] Restarting tasks ... done.
(And then nothing.)

Interestingly I did see the "resume of devices" time grow on each resume
again this time. I'll put the full dmesg dump in the same place like
before: http://gaast.net/~wilmer/.lkml/

There's a lspci -vv dump there as well, as Bjorn asked for. I'll file a
bug on bugzilla tonight.

> as some driver is using pci_enable_device in .resume instead of
> pci_renable_device....
>
Maybe this doesn't matter, but I could reproduce this issue even with no
modules loaded at all (so barebone that I couldn't even mount my rootfs
and had to do this testing in the initrd), so with only mainline kernel
code running.


Thanks,

Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-16 16:36:37

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 16, 2014 at 2:36 AM, Wilmer van der Gaast <[email protected]> wrote:
> Hello,
>
> On 16-10-14 05:32, Yinghai Lu wrote:
>>
>>
>> Can you please try attached patch? that should workaround the problem.
>>
> Sadly, no luck. (I do assume you meant me to use the patch against a clean
> 3.17 tree *without* yesterday's revert patch applied.) Back to a crash
> at/after the third resume:
>
> [ 372.502897] usb 3-1.1: reset high-speed USB device number 3 using
> ehci-pci
> [ 372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
> [ 373.398437] Clocksource tsc unstable (delta = -136457848 ns)
> [ 373.897503] Switched to clocksource hpet
> [ 373.897536] PM: resume of devices complete after 2143.535 msecs
> [ 373.898225] r8169 0000:07:00.0 eth0: link up
> [ 374.319311] Restarting tasks ... done.
> (And then nothing.)
>
> Interestingly I did see the "resume of devices" time grow on each resume
> again this time. I'll put the full dmesg dump in the same place like before:
> http://gaast.net/~wilmer/.lkml/

Checked that dmesg and console output, looks ok from last resume.

Can you put "debug ignore_loglevel" in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Also did you try to remove r8169 every time before suspend?

Thanks

Yinghai

2014-10-16 21:08:38

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

I have filed a bug now:
https://bugzilla.kernel.org/show_bug.cgi?id=86421 We should probably
continue the discussion there now? I've added just you to the CC field,
not sure who else on this thread is still interested at this point.

On 16-10-14 17:36, Yinghai Lu wrote:
>
> Can you put "debug ignore_loglevel" in boot command line?
> So we can compare output from serial console between good one and bad
> one directly.
>
Did that, will throw the output in the same log dir. Those arguments
resulted in very little extra output. :-/

> Also did you try to remove r8169 every time before suspend?
>
Did that on this run, no difference either. For full completeness, I
reproduced this problem with no modules loaded (done from initramfs) at
all, with a kernel with your workaround included, logs are here:
http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-18 21:28:59

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 16, 2014 at 2:08 PM, Wilmer van der Gaast <[email protected]> wrote:
> Did that on this run, no difference either. For full completeness, I
> reproduced this problem with no modules loaded (done from initramfs) at all,
> with a kernel with your workaround included, logs are here:
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt

Yes, those output are good.

Please apply attached debug patch on top of v3.17 and boot with
"debug ignore_loglevel initcall_debug no_console_suspend".

Hope we can find out which nb notifier cause problem.

Thanks

Yinghai


Attachments:
debug_suspend_resume_x.patch (1.81 kB)

2014-10-18 23:57:18

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

(Resending, forgot to hit reply-to-all.)

Hello Yinghai,

On 18-10-14 22:28, Yinghai Lu wrote:
>
> Please apply attached debug patch on top of v3.17 and boot with
> "debug ignore_loglevel initcall_debug no_console_suspend".
>
> Hope we can find out which nb notifier cause problem.
>
Did that. Strangely, or better said, quite annoyingly, I'm now getting
no output anymore at all on the third resume! :-(

I could try non-serial instead if you think that's worth a shot, but the
most annoying thing is that my video doesn't get initialised properly
after resume unless I have the tainting nvidia driver loaded. I could
try if nouveau helps.

I've dropped all the debugging output in the same directory like before,
look for files named like
http://roy.gaast.net/~wilmer/.lkml/bad3.17-patched-initcall.txt


Thanks,

Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-19 04:29:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Sat, Oct 18, 2014 at 4:57 PM, Wilmer van der Gaast <[email protected]> wrote:
> On 18-10-14 22:28, Yinghai Lu wrote:
>>
>> Please apply attached debug patch on top of v3.17 and boot with
>> "debug ignore_loglevel initcall_debug no_console_suspend".
>>
>> Hope we can find out which nb notifier cause problem.
>>
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

oh no.

Please try to "debug ignore_loglevel no_console_suspend".

Thanks

Yinghai

2014-10-19 08:08:11

by Pavel Machek

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Sun 2014-10-19 00:57:12, Wilmer van der Gaast wrote:
> (Resending, forgot to hit reply-to-all.)
>
> Hello Yinghai,
>
> On 18-10-14 22:28, Yinghai Lu wrote:
> >
> > Please apply attached debug patch on top of v3.17 and boot with
> > "debug ignore_loglevel initcall_debug no_console_suspend".
> >
> > Hope we can find out which nb notifier cause problem.
> >
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

Tainting should not be a problem. If it works for you, it works...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-19 10:48:59

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 19-10-14 05:29, Yinghai Lu wrote:
>
> Please try to "debug ignore_loglevel no_console_suspend".
>
Same thing. :-(

[ 72.572354] Restarting tasks ... done.
[ 72.576554] PM: calling nb rcu_pm_notify+0x0/0x60
[ 72.581277] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 72.586115] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 72.591692] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 72.597345] PM: calling nb fw_pm_notify+0x0/0x150
[ 72.602047] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 72.606839] PM: calling nb bsp_pm_callback+0x0/0x50
[ 72.611711] PM: ... nb bsp_pm_callback+0x0/0x50 done
[ 73.382175] r8169 0000:07:00.0 eth0: link up
[ 78.857526] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 79.025718] ata3.00: configured for UDMA/133
[ 81.379533] ata4: softreset failed (device not ready)
[ 82.623212] PM: Syncing filesystems ... done.
[ 82.661564] PM: Preparing system for mem sleep
[ 82.669405] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 82.677729] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 82.686338] PM: Entering mem sleep

And nothing related to resume. :-(

Is there any point of me retrying with the initcall_debug flag but
without your patch?

Looking at your patch again, it seems pretty mad that this would cause
such a big difference. Overnight I remembered how my machine has TSC
issues at the time this bug shows, so I tried setting hpet as the
clocksource. (hpet=force on the cmdline did not seem to have that effect
so I used sysfs instead) No effect either.

I need to go now, can experiment a little more tonight.


Thanks,

Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-21 21:40:51

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

Sorry for the delay, finally poked at this again. It looks like the
no_console_suspend flag was causing troubles, which I didn't really need
anyway with logging going to my serial port.

This is what I get now on the failing resume:

[ 112.879390] PM: resume of devices complete after 2239.905 msecs
[ 112.880068] r8169 0000:07:00.0 eth0: link up
[ 112.880078] Switched to clocksource hpet
[ 116.069248] PM: Finishing wakeup.
[ 116.072574] Restarting tasks ... done.
[ 116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
[ 116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 116.088526] systemd[1]: Got notification message for unit
systemd-journald.service
[ 116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 116.105099] PM: calling nb fw_pm_notify+0x0/0x150
[ 116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
[ 116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done

And then nothing, and it's hung. Looks the same to me (apart from the
tsc issues + hpet switch) as a successful resume:

[ 95.499513] PM: resume of devices complete after 1240.115 msecs
[ 96.368940] r8169 0000:07:00.0 eth0: link up
[ 98.676455] PM: Finishing wakeup.
[ 98.679765] Restarting tasks ... done.
[ 98.683821] PM: calling nb rcu_pm_notify+0x0/0x60
[ 98.688524] PM: ... nb rcu_pm_notify+0x0/0x60 done
[ 98.692044] systemd[1]: Got notification message for unit
systemd-journald.service
[ 98.700897] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[ 98.706470] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[ 98.712132] PM: calling nb fw_pm_notify+0x0/0x150
[ 98.716848] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 98.721644] PM: calling nb bsp_pm_callback+0x0/0x50
[ 98.726536] PM: ... nb bsp_pm_callback+0x0/0x50 done

Full logs in http://gaast.net/~wilmer/.lkml/bad3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-21 23:15:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Tue, Oct 21, 2014 at 2:40 PM, Wilmer van der Gaast <[email protected]> wrote:
> Hello,
>
> Sorry for the delay, finally poked at this again. It looks like the
> no_console_suspend flag was causing troubles, which I didn't really need
> anyway with logging going to my serial port.
>
> This is what I get now on the failing resume:
>
> [ 112.879390] PM: resume of devices complete after 2239.905 msecs
> [ 112.880068] r8169 0000:07:00.0 eth0: link up
> [ 112.880078] Switched to clocksource hpet
> [ 116.069248] PM: Finishing wakeup.
> [ 116.072574] Restarting tasks ... done.
> [ 116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
> [ 116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
> [ 116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
> [ 116.088526] systemd[1]: Got notification message for unit
> systemd-journald.service
> [ 116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
> [ 116.105099] PM: calling nb fw_pm_notify+0x0/0x150
> [ 116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
> [ 116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
> [ 116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done
>
> And then nothing, and it's hung. Looks the same to me (apart from the tsc
> issues + hpet switch) as a successful resume:

then it stuck in pm_restore_console()?

Please check attached debut patch.

Thanks

Yinghai


Attachments:
debug_suspend_resume_y.patch (1.73 kB)

2014-10-22 12:54:04

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello Yinghai,

This looks more promising!

Yinghai Lu ([email protected]) wrote:
> >
> > And then nothing, and it's hung. Looks the same to me (apart from the tsc
> > issues + hpet switch) as a successful resume:
>
> then it stuck in pm_restore_console()?
>
That seems to be the case yes:

[ 106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
[ 106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
[ 106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
[ 106.675775] pm_restore_console() before move

Then nothing, during the third resume.

http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
the full log.

(Some of your other debug lines in your patch don't seem to be logging
anything during my repro BTW.)


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-26 21:53:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Wed, Oct 22, 2014 at 5:53 AM, Wilmer van der Gaast <[email protected]> wrote:
> That seems to be the case yes:
>
> [ 106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
> [ 106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
> [ 106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
> [ 106.675775] pm_restore_console() before move
>
> Then nothing, during the third resume.
>
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
> the full log.
>
> (Some of your other debug lines in your patch don't seem to be logging
> anything during my repro BTW.)

Please try attached two debug patches to check the pci registers
between the suspend/resume.


Attachments:
debug_extra_dump_pci.patch (1.76 kB)
debug_suspend_resume_z.patch (1.01 kB)
Download all attachments

2014-10-27 10:50:11

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello Yinghai,

Thanks again for your time!

I've applied your two patches, and as a wild guess also added pci=dump
to my kernel cmdline though I guess that just gave me a boot-time dump -
which mostly didn't make it into my dmesg.

I accidentally booted with no_console_suspend on the first run, which
still caused no output at all on the failed resume. I'm including the
output of that anyway, but also I have a run with that flag removed, and
annoyingly the crash appears to happen before the dump during the crash
finishes - while dumping info for this device, it seems:

04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev
10) (prog-if 01 [Subtractive decode])

(More info in my lspci.txt)

Wondering what device that is exactly, I stumbled upon
http://sourceforge.net/p/linux1394/mailman/message/29755048/ where
someone describes it as a "cheap and crappy PCI bridge". More and more I
wonder if I should just buy a new motherboard - sadly this one wasn't
even that cheap. :-( Though I don't know if the output stopping while
dumping output for this device means that it is the culprit, is printk()
to the serial console in any way blocking/buffered?

Anyway, dumps are in:

http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps-no_console_suspend.txt
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt


Cheers,

Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-27 18:23:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Mon, Oct 27, 2014 at 3:50 AM, Wilmer van der Gaast <[email protected]> wrote:

> http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt

[ 252.028142] PCI: 0000:04:00.0
0000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0010: ff ff ff ff ff ff ff ff


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
(rev 10) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fbc00000-fbcfffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
Capabilities: [90] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 5000

under

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
(prog-if 01 [Subtractive decode])

So that ITE will not work after suspend/resume?

Please apply 4 attached patches and try to remove the device like

echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable

before suspend/resume test.

Thanks

Yinghai


Attachments:
move_pcie_link_disable_1.patch (2.57 kB)
move_pcie_link_disable_2.patch (1.90 kB)
pci_express_link.patch (2.51 kB)
pci_express_link_disable.patch (1.88 kB)
Download all attachments

2014-10-27 21:21:59

by Pavel Machek

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Mon 2014-10-27 10:50:04, Wilmer van der Gaast wrote:
> Hello Yinghai,
>
> Thanks again for your time!
>
> I've applied your two patches, and as a wild guess also added pci=dump to my
> kernel cmdline though I guess that just gave me a boot-time dump - which
> mostly didn't make it into my dmesg.
>
> I accidentally booted with no_console_suspend on the first run, which still
> caused no output at all on the failed resume. I'm including the output of
> that anyway, but also I have a run with that flag removed, and annoyingly
> the crash appears to happen before the dump during the crash finishes -
> while dumping info for this device, it seems:
>
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
> (prog-if 01 [Subtractive decode])
>
> (More info in my lspci.txt)
>
> Wondering what device that is exactly, I stumbled upon
> http://sourceforge.net/p/linux1394/mailman/message/29755048/ where someone
> describes it as a "cheap and crappy PCI bridge". More and more I wonder if I
> should just buy a new motherboard - sadly this one wasn't even that
> cheap.

It is probably not just you that is affected, and we already know what
change broke it. So we really should fix it.

> :-( Though I don't know if the output stopping while dumping output for this
> device means that it is the culprit, is printk() to the serial console in
> any way blocking/buffered?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-27 22:22:27

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 27-10-14 18:23, Yinghai Lu wrote:
>
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>
> So that ITE will not work after suspend/resume?
>
Even after the first one already, you mean?

Honestly, I don't really know what its purpose is, and it doesn't have
any child nodes in the PCI tree from what I can tell. Possibly because I
don't have any PCI cards in the machine, just a PCIe video card -
assuming this is a PCI bridge taking care of legacy PCI plugin cards?

> Please apply 4 attached patches and try to remove the device like
>
> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>
> before suspend/resume test.
>
That worked! Resumed properly now.

Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the
PCI dump at boot time, where that device doesn't dump just ff's.


Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-27 23:41:57

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Mon, Oct 27, 2014 at 3:22 PM, Wilmer van der Gaast <[email protected]> wrote:
> Hello,
>
> On 27-10-14 18:23, Yinghai Lu wrote:
>>
>>
>> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>>
>> So that ITE will not work after suspend/resume?
>>
> Even after the first one already, you mean?

Yes.

>
> Honestly, I don't really know what its purpose is, and it doesn't have any
> child nodes in the PCI tree from what I can tell. Possibly because I don't
> have any PCI cards in the machine, just a PCIe video card - assuming this is
> a PCI bridge taking care of legacy PCI plugin cards?
>
>> Please apply 4 attached patches and try to remove the device like
>>
>> echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/remove
>> echo 1 > /sys/bus/pci/devices/0000\:00\:1c.3/pcie_link_disable
>>
>> before suspend/resume test.
>>
> That worked! Resumed properly now.
>
> Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the PCI
> dump at boot time, where that device doesn't dump just ff's.

Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

Thanks

Yinghai

2014-10-28 00:03:19

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On 27-10-14 23:41, Yinghai Lu wrote:
>
> Can you only apply the patch that revert enable bridge early and
> two pci dump patches to see if 04:00.0 readout is 0xff?
>
I was curious about that already, did that with a 3.16.6 that I think
just had your revert applied (and using lspci -xxxx to get the dump
which I assumed would be the same): No changes to 04:00 at all.

Confirmed that this is the case with 3.17 + those patches as well, it's
showing this at all times:

[ 130.000122] PCI: 0000:04:00.0
0000: 83 12 92 88 07 00 10 00 10 01 04 06 01 00 01 00
0010: 00 00 00 00 00 00 00 00 04 05 05 20 d1 d1 20 22
0020: c0 fb c0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 02
0040: 0c 31 00 00 08 06 00 00 00 00 00 00 ff 00 00 00
0050: 72 ab b9 6d 00 00 00 00 20 c9 8e 00 00 00 00 00
0060: 00 00 00 00 aa 0d 00 10 00 44 00 00 00 00 00 80
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 01 a0 42 fe 00 00 00 00 00 00 00 00 00 00 00 00
00a0: 0d 00 00 00 58 14 00 50 00 00 00 00 00 00 00 00
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00f0: 00 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-28 01:12:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <[email protected]> wrote:
> I was curious about that already, did that with a 3.16.6 that I think just
> had your revert applied (and using lspci -xxxx to get the dump which I
> assumed would be the same): No changes to 04:00 at all.
>
> Confirmed that this is the case with 3.17 + those patches as well, it's
> showing this at all times:

can you post
lspci -vvxxxx -s 00:1c.3
lspci -vvxxxx -s 04:00.0
before reverting enable bridge early patch
and after reverting on 3.17+?

Thanks

Yinghai

2014-10-28 04:03:15

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Mon, Oct 27, 2014 at 6:12 PM, Yinghai Lu <[email protected]> wrote:
> On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast <[email protected]> wrote:
>> I was curious about that already, did that with a 3.16.6 that I think just
>> had your revert applied (and using lspci -xxxx to get the dump which I
>> assumed would be the same): No changes to 04:00 at all.
>>
>> Confirmed that this is the case with 3.17 + those patches as well, it's
>> showing this at all times:
>
> can you post
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch
> and after reverting on 3.17+?

Please check if attached patch could fix the problem on your setup.

Thanks

Yinghai


Attachments:
pci_set_bridge_d0.patch (793.00 B)

2014-10-28 10:23:11

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On 28-10-14 04:03, Yinghai Lu wrote:
>
> Please check if attached patch could fix the problem on your setup.
>
Sadly it looks like it did not. :-( Applied your patch on a vanilla 3.17
tree, still seeing the same crash.

I'll get more debugging output and the output you asked for in your
previous e-mail tonight, need to go to work now.


Cheers,

Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-28 23:34:24

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 28-10-14 01:12, Yinghai Lu wrote:
> lspci -vvxxxx -s 00:1c.3
> lspci -vvxxxx -s 04:00.0
> before reverting enable bridge early patch

http://gaast.net/~wilmer/.lkml/lspcixx-nopatch.txt (So that's 3.17 +
your revert patch)

> and after reverting on 3.17+?
>
http://gaast.net/~wilmer/.lkml/lspcixx-patched.txt

plain 3.17.

I've run the commands twice, once before and once after a single
suspend+resume cycle. Small difference and only before that cycle:

ruby:~/crashit# diff -u lspcixx-*
--- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
+++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
@@ -92,10 +92,10 @@
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
+320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
-350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
+340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
+350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(Diff is in the Intel device, not the ITE one.)


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-29 05:17:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Tue, Oct 28, 2014 at 4:34 PM, Wilmer van der Gaast <[email protected]> wrote:
>
> I've run the commands twice, once before and once after a single
> suspend+resume cycle. Small difference and only before that cycle:
>
> ruby:~/crashit# diff -u lspcixx-*
> --- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +0000
> +++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +0000
> @@ -92,10 +92,10 @@
> 2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
> +320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
> 330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
> -350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
> +340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
> +350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
> 360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> (Diff is in the Intel device, not the ITE one.)
>

That is strange.

Anyway please try attached patched on top of 3.17.

Thanks

Yinghai


Attachments:
debug_suspend_resume_z_xx.patch (511.00 B)

2014-10-29 09:37:41

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Helllo,

On 29-10-14 05:17, Yinghai Lu wrote:
>> (Diff is in the Intel device, not the ITE one.)
> That is strange.
>
I did wonder later, why was I not seeing the ff* dump anymore after the
resume..

> Anyway please try attached patched on top of 3.17.
>
Done, and that did work! Four suspend+resume cycles later and it's still
stable.


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-30 00:53:16

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Wed, Oct 29, 2014 at 2:37 AM, Wilmer van der Gaast <[email protected]> wrote:
>
>> Anyway please try attached patched on top of 3.17.
>>
> Done, and that did work! Four suspend+resume cycles later and it's still
> stable.

Then can you test attached simplified one.


Attachments:
debug_suspend_resume_z_yy.patch (835.00 B)

2014-10-30 10:37:01

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 30-10-14 00:53, Yinghai Lu wrote:
>> Done, and that did work! Four suspend+resume cycles later and it's still
>> stable.
> Then can you test attached simplified one.
>
Sadly, with that patch (applied against a vanilla 3.17 tree like all the
others) the second resume fails already. :-(


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-30 16:57:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 30, 2014 at 3:36 AM, Wilmer van der Gaast <[email protected]> wrote:

> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(

oh, no. Really want to know which bit causes the problem.

Please check debug patch...that will print out pci conf space before
...and after...


Attachments:
debug_extra_dump_pci.patch (1.76 kB)
debug_suspend_resume_z_zz.patch (740.00 B)
Download all attachments

2014-10-30 21:55:19

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

On 30-10-14 16:57, Yinghai Lu wrote:
>> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> others) the second resume fails already. :-(
>
> oh, no. Really want to know which bit causes the problem.
>
Good question. And I think you will find my new finding even more
confusing: With your two patches from this e-mail, I could
suspend+resume 3? with no problems.. With just your two debugging
patches applied.

Lovely heisenbug here. I'll add that for every test so far I've removed
the kernel source tree, re-untarred it and applied the patches from your
e-mails on that, so the tests should be consistent. As is the bug
normally, before we started testing patches the crashes were already
always *very* reliably happening exactly after the third resume.

Just to be sure this morning was not a fluke, I've retested your patch
from this morning, and still a crash on the second resume.

> Please check debug patch...that will print out pci conf space before
> ...and after...
>
http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-30 23:02:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 30, 2014 at 2:54 PM, Wilmer van der Gaast <[email protected]> wrote:

> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt

no difference except on 00:1c.3

--- before.txt 2014-10-30 15:20:35.782886485 -0700
+++ after.txt 2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please try attached patch on top of 3.17 without other patches.

If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Thanks

Yinghai


Attachments:
debug_suspend_resume_xxx.patch (720.00 B)

2014-10-30 23:24:52

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume



On 30-10-14 23:02, Yinghai Lu wrote:
>> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt
>
> no difference except on 00:1c.3
>
> --- before.txt 2014-10-30 15:20:35.782886485 -0700
> +++ after.txt 2014-10-30 15:21:37.034882515 -0700
> @@ -49,10 +49,10 @@
> 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
> +0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
> 0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
> -0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
> +0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
> +0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
> 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
Those diffs are in exactly the same offsets like the dumps I was diffing
a few days ago it seems.

> Please try attached patch on top of 3.17 without other patches.
>
Same problem like this morning: Failure after the second resume already. :-(

> If it is working, please dump acpi tables include dsdt.
> need to check if there extra work in _PRT.
>
Original files and iasl interpretations in:
http://gaast.net/~wilmer/.lkml/tables/


Thanks,

Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-31 00:43:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <[email protected]> wrote:
>
>
> Same problem like this morning: Failure after the second resume already. :-(
>
can not find out any magic line in pci_enable_bridge that could cause
the difference.

so either use attached pcie_enable_bridge_ite.patch or just revert the
commit 928bea9?

Bjorn, please check which one that you want to go on.

Thanks

Yinghai


Attachments:
pci_enable_bridge_ite.patch (594.00 B)
revert_928bea9_from_3.17.patch (7.02 kB)
Download all attachments

2014-10-31 02:13:15

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Thu, Oct 30, 2014 at 5:43 PM, Yinghai Lu <[email protected]> wrote:
> On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast <[email protected]> wrote:
>>
>>
>> Same problem like this morning: Failure after the second resume already. :-(
>>
> can not find out any magic line in pci_enable_bridge that could cause
> the difference.
>
> so either use attached pcie_enable_bridge_ite.patch or just revert the
> commit 928bea9?

Last try:

Please check attached patch that will keep state consistent.

Thanks

Yinghai


Attachments:
pci_enable_bridge_ite_x.patch (1.06 kB)

2014-10-31 09:39:41

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello Yinghai,

On 31-10-14 02:13, Yinghai Lu wrote:
> Last try:
>
> Please check attached patch that will keep state consistent.

Good news: This last patch worked! For good measure, I ran my test twice
with a reboot in between. Worked consistently.

And similarly, to ensure that your debugging-at-boottime-only patch
wasn't just working by accident yesterday, I tested it twice more with
the same effect.


Thanks,

Wilmer van der Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-31 16:11:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Fri, Oct 31, 2014 at 2:39 AM, Wilmer van der Gaast <[email protected]> wrote:
> Hello Yinghai,
>
> On 31-10-14 02:13, Yinghai Lu wrote:
>>
>> Last try:
>>
>> Please check attached patch that will keep state consistent.
>
>
> Good news: This last patch worked! For good measure, I ran my test twice
> with a reboot in between. Worked consistently.
>
> And similarly, to ensure that your debugging-at-boottime-only patch wasn't
> just working by accident yesterday, I tested it twice more with the same
> effect.

Good. Please check if attached one on top of 3.17 only would work too.

Thanks

Yinghai


Attachments:
debug_suspend_resume_xxx1.patch (643.00 B)

2014-10-31 21:13:57

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On 31-10-14 16:11, Yinghai Lu wrote:
>
> Good. Please check if attached one on top of 3.17 only would work too.
>
No luck, sadly. :-( Unsuccessful third resume.

I forgot to set up the serial console, would that still be useful?


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-10-31 21:22:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <[email protected]> wrote:
> On 31-10-14 16:11, Yinghai Lu wrote:
>>
>>
>> Good. Please check if attached one on top of 3.17 only would work too.
>>
> No luck, sadly. :-( Unsuccessful third resume.
>
> I forgot to set up the serial console, would that still be useful?

never mind, let me go through suspend/resume code path again.

2014-10-31 23:18:11

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Fri, Oct 31, 2014 at 2:22 PM, Yinghai Lu <[email protected]> wrote:
> On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast <[email protected]> wrote:
>> On 31-10-14 16:11, Yinghai Lu wrote:
>>>
>>>
>>> Good. Please check if attached one on top of 3.17 only would work too.
>>>
>> No luck, sadly. :-( Unsuccessful third resume.

Please try attached two patches separately on top of 3.17.


Attachments:
pci_enable_bridge_ite.patch (0.99 kB)
pci_pm_reenable_device_enhance.patch (859.00 B)
Download all attachments

2014-11-01 00:01:06

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

Hello,

Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
problem as well!


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +

2014-11-01 02:10:32

by Yinghai Lu

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On Fri, Oct 31, 2014 at 5:00 PM, Wilmer van der Gaast <[email protected]> wrote:
> Hello,
>
> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
> problem as well!

updated first #1.


Attachments:
pci_enable_bridge_ite_v2.patch (0.99 kB)

2014-11-02 23:16:55

by Wilmer van der Gaast

[permalink] [raw]
Subject: Re: Machine crashes right *after* ~successful resume

On 01-11-14 02:10, Yinghai Lu wrote:
>> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
>> problem as well!
> updated first #1.
>
Works as well!


Wilmer v/d Gaast.

--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer http://www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +