Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753855AbdLMVGr (ORCPT ); Wed, 13 Dec 2017 16:06:47 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:36309 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753149AbdLMVGq (ORCPT ); Wed, 13 Dec 2017 16:06:46 -0500 Date: Wed, 13 Dec 2017 22:06:40 +0100 (CET) From: Thomas Gleixner To: Linus Torvalds cc: Bjorn Helgaas , Maarten Lankhorst , Michal Hocko , "Rafael J. Wysocki" , Andy Lutomirski , Linux Kernel Mailing List , the arch/x86 maintainers , Daniel Vetter , Bjorn Helgaas , "Rafael J. Wysocki" , linux-pci@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3 In-Reply-To: Message-ID: References: <168050887.sZlTFXWCmO@aspire.rjw.lan> <20171206121452.GA6320@dhcp22.suse.cz> <0f1d3d63-fa10-5cef-8014-81753dc60243@mblankhorst.nl> <57c8679e-1b88-c9ad-2299-2bea7560b28f@mblankhorst.nl> <20171213162336.GG53955@bhelgaas-glaptop.roam.corp.google.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1717 Lines: 38 On Wed, 13 Dec 2017, Thomas Gleixner wrote: > On Wed, 13 Dec 2017, Thomas Gleixner wrote: > > On Wed, 13 Dec 2017, Linus Torvalds wrote: > > > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner wrote: > > > > > > > > Definitely. That was fragile forever but puzzles me is that I can't figure > > > > out what now causes that spurious interrupt to surface out of the blue. > > > > > > Perhaps just timing? > > > > That's what I'm trying to figure out right now, because that is the only > > sensible explanation left. The whole machinery of suspend is exactly the > > same with and without the vector changes. I instrumented all functions > > involved and the picture is the same. I even do not see any fundamental > > timing differences where one would say: That's it. > > > > What puzzles me even more is that in the range of commits I'm fiddling with > > there is no other change than the vector management stuff and the point > > where it breaks makes no sense at all. The point Maarten bisected it to > > works nicely here, so that might just point to a very subtle timing issue. > > After doing more debugging on this it turns out that this looks like a > legacy interrupt coming in. The vector number is always 55, which is legacy > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is > masked and vector 55 is completely unused. > > More questions than answers. Still investigating. And it does not explain Maartens report which gets a spurious vector 33 on CPU4 after the non boot cpus have been brought online again. And that's the vector which was assigned before the affinity was moved by unplugging CPU4. Hrmpf. Even more mystery to solve. Thanks, tglx