Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753645AbdLMQXp (ORCPT ); Wed, 13 Dec 2017 11:23:45 -0500 Received: from mail.kernel.org ([198.145.29.99]:51826 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753606AbdLMQXj (ORCPT ); Wed, 13 Dec 2017 11:23:39 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABE32218AF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Wed, 13 Dec 2017 10:23:37 -0600 From: Bjorn Helgaas To: Thomas Gleixner Cc: Maarten Lankhorst , Michal Hocko , Linus Torvalds , "Rafael J. Wysocki" , Andy Lutomirski , Linux Kernel Mailing List , the arch/x86 maintainers , Daniel Vetter , Bjorn Helgaas , "Rafael J. Wysocki" , linux-pci@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3 Message-ID: <20171213162336.GG53955@bhelgaas-glaptop.roam.corp.google.com> References: <168050887.sZlTFXWCmO@aspire.rjw.lan> <20171206121452.GA6320@dhcp22.suse.cz> <0f1d3d63-fa10-5cef-8014-81753dc60243@mblankhorst.nl> <57c8679e-1b88-c9ad-2299-2bea7560b28f@mblankhorst.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1916 Lines: 59 [+cc linux-pci, linux-pm] On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote: > So I was finally able to figure out what the hell is going on: > > Suspend: > > - The device suspend code puts the graphics card into a power > state != PCI_D0. > > - Offline non boot CPUs > > - Break interrupt affinity. Allocate new vector on CPU 0, compose and > write MSI message which ends up in: > > __pci_write_msi_msg(entry, msg) > { > if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) { > /* Don't touch the hardware now */ > } else { > .... > } > entry->msg = *msg; > } > > So because the device is not in PCI_D0 the message is not written. It's > written in the device resume path. I'm not a PM guru, but this ordering seems fragile. If we offline CPUs before re-targeting interrupts directed at those CPUs, aren't we always going to be at risk of sending interrupts to an offline CPU? Even if the device is now asleep and therefore should not generate an interrupt, it seems like there's a window when the device returns to PCI_D0 where it could generate an interrupt before we have a chance to update the MSI message. > Resume: > [ 139.670446] ACPI: Low-level resume complete > [ 139.670541] PM: Restoring platform NVS memory > [ 139.672462] do_IRQ: 0.55 No irq handler for vector > [ 139.672475] Enabling non-boot CPUs ... > > So the spurious interrupt happens early and way before the device resume > code writes the new MSI message. > > I checked the behaviour on 4.14. The MSI write is delayed there in the same > way, but there is no spurious interrupt. There is no interrupt coming in at > all _BEFORE_ the device is put out of PCI_D0. > > And this has certainly nothing to do with the vector management changes, > but I can't figure yet what makes that spurious interrupt to be sent. > > Any ideas welcome. > > Thanks, > > tglx >