Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751090AbdLMX0f (ORCPT ); Wed, 13 Dec 2017 18:26:35 -0500 Received: from mail-ot0-f195.google.com ([74.125.82.195]:45235 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750737AbdLMX0d (ORCPT ); Wed, 13 Dec 2017 18:26:33 -0500 X-Google-Smtp-Source: ACJfBotg0xTc19NRxsbCFYLi+wvxCX6ny7ICvLHkcThJu5aYB5IOFEBiTkqthhxATUtlUN7uuyXJoEM8yjZcuqHO/3U= MIME-Version: 1.0 In-Reply-To: <2011671.y8uyto1vn5@aspire.rjw.lan> References: <168050887.sZlTFXWCmO@aspire.rjw.lan> <2011671.y8uyto1vn5@aspire.rjw.lan> From: "Rafael J. Wysocki" Date: Thu, 14 Dec 2017 00:26:32 +0100 X-Google-Sender-Auth: zAdTl7DwnTK5dHalxsKCANw60xc Message-ID: Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3 To: Thomas Gleixner Cc: Linus Torvalds , Bjorn Helgaas , Maarten Lankhorst , Michal Hocko , Andy Lutomirski , Linux Kernel Mailing List , "the arch/x86 maintainers" , Daniel Vetter , Bjorn Helgaas , "Rafael J. Wysocki" , Linux PCI , Linux PM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2069 Lines: 43 On Wed, Dec 13, 2017 at 11:39 PM, Rafael J. Wysocki wrote: > On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote: >> On Wed, 13 Dec 2017, Linus Torvalds wrote: >> >> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner wrote: >> > > >> > > Definitely. That was fragile forever but puzzles me is that I can't figure >> > > out what now causes that spurious interrupt to surface out of the blue. >> > >> > Perhaps just timing? >> >> That's what I'm trying to figure out right now, because that is the only >> sensible explanation left. The whole machinery of suspend is exactly the >> same with and without the vector changes. I instrumented all functions >> involved and the picture is the same. I even do not see any fundamental >> timing differences where one would say: That's it. >> >> What puzzles me even more is that in the range of commits I'm fiddling with >> there is no other change than the vector management stuff and the point >> where it breaks makes no sense at all. The point Maarten bisected it to >> works nicely here, so that might just point to a very subtle timing issue. >> >> > How hard would it be to change the ordering to just redirect irqs first? >> >> The whole interrupt redirection happens when the non boot CPUs are brought >> down, which is the very last step before the actual suspend happens. >> >> We could probably do that earlier, but that's something Rafael needs to >> answer ultimately. > > Well, that's both flattering and concerning. ;-) > > Anyway, yes, we can do that earlier AFAICS. Action handlers are not going to > run after we've called suspend_device_irqs() which happens before the final > stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU > gets the interrupt from that point on (it is either wakeup or unwanted then). There is a catch that we don't and likely should not do that for suspend-to-idle, but since we have pm_suspend_target_state now, that case can be distinguished from the "full suspend" one readily. Thanks, Rafael