Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp2605349ybi; Mon, 17 Jun 2019 07:35:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqzwkrtw6xyB4WMdjzpJFY9L7HcyKmIVpTcvIpIcBRQkrZ0s4A58dkKasvqVqAb/Jmbjzgqg X-Received: by 2002:a17:902:e792:: with SMTP id cp18mr12444761plb.163.1560782148957; Mon, 17 Jun 2019 07:35:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560782148; cv=none; d=google.com; s=arc-20160816; b=qN9zxHJaU7nDfu1OAKWwxgmYDJV69rT9xqvxN0jAtAc3Es5qzLsVQpiTepPNbdZwbd LX9gnTJYkUqp6OkfHm9QI3JxO9HGCNhmD0zCJWjpI8VKiKpz/xnC25UU+jN25NrOtoLJ TjKx3iYISAPDP3H3cpmcPsHKrgKyjfOCQ9o+tqyrrzLGtwZLER+XgZFBIJKilxTUIPBU +SOuJRrVOqsbHNQZwxSM1PHhUxbFhDZQq8wl9AbQaJ8scbAJLz3X5cNnsKd8rIPb+ryT fjqOK3ZnTklM94vrbnbTyjTRTu0aj4U4CqCvzv/lL6F62AmtuS9gc1Ias7CVdodP+J+1 tMUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:organization:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=TQjDdRoFXC3dj01DpSbQzfKTM20Avpcnug0BBqJy2+k=; b=K/YquE1QwSu3JIDEdhJ7Kg1g20lnWD6B28DLQuAhBK0g7bBIgLGlvrSnlYYeQVVNIc C7q1OaPJkp7TMm05ZOi8/auXv/TX/FwkKn5xXruO+EhlV5J8ItDwPD5BpzbwQUi8IAiQ DEfesCQD/qMof8amNhC/x3W31b73KGWoSp8fAsTZREkPR+yzTL7hIUJSRib3wdmTt37r L/dkf1+9U8RvkCfuJTYEY3deeF4G8upwBl6Y9v+QpVefM5mKwSNfleITCF3jBNfq5n3R RUudeHlnwJ1bUOr1hgOUW/WfplEuXz+wr2FVjYLNf10dqlFGiDlNUIsyARq2KJaWtheu ANeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a129si11061394pfa.200.2019.06.17.07.35.32; Mon, 17 Jun 2019 07:35:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726669AbfFQOfP (ORCPT + 99 others); Mon, 17 Jun 2019 10:35:15 -0400 Received: from mga18.intel.com ([134.134.136.126]:51264 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726028AbfFQOfP (ORCPT ); Mon, 17 Jun 2019 10:35:15 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jun 2019 07:35:14 -0700 X-ExtLoop1: 1 Received: from lahna.fi.intel.com (HELO lahna) ([10.237.72.157]) by fmsmga001.fm.intel.com with SMTP; 17 Jun 2019 07:35:11 -0700 Received: by lahna (sSMTP sendmail emulation); Mon, 17 Jun 2019 17:35:10 +0300 Date: Mon, 17 Jun 2019 17:35:10 +0300 From: Mika Westerberg To: "Rafael J. Wysocki" Cc: Lukas Wunner , Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, "Rafael J. Wysocki" , Keith Busch , Alex Williamson , Alexandru Gagniuc Subject: Re: [PATCH] PCI/PME: Fix race on PME polling Message-ID: <20190617143510.GT2640@lahna.fi.intel.com> References: <0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas@wunner.de> <1957149.eOSnrBRbHu@kreacher> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1957149.eOSnrBRbHu@kreacher> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo User-Agent: Mutt/1.11.4 (2019-03-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 17, 2019 at 12:37:06PM +0200, Rafael J. Wysocki wrote: > On Sunday, June 9, 2019 1:29:33 PM CEST Lukas Wunner wrote: > > Since commit df17e62e5bff ("PCI: Add support for polling PME state on > > suspended legacy PCI devices"), the work item pci_pme_list_scan() polls > > the PME status flag of devices and wakes them up if the bit is set. > > > > The function performs a check whether a device's upstream bridge is in > > D0 for otherwise the device is inaccessible, rendering PME polling > > impossible. However the check is racy because it is performed before > > polling the device. If the upstream bridge runtime suspends to D3hot > > after pci_pme_list_scan() checks its power state and before it invokes > > pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and > > mistake it for a set PME status flag. I am seeing this race play out as > > a Thunderbolt controller going to D3cold and occasionally immediately > > going to D0 again because PM polling was performed at just the wrong > > time. > > > > Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). > > > > Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") > > Tested-by: Mika Westerberg > > Signed-off-by: Lukas Wunner > > Cc: stable@vger.kernel.org # v2.6.34+ > > Cc: Rafael J. Wysocki > > --- > > drivers/pci/pci.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 8abc843b1615..eed5db9f152f 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > > return false; > > + if (pmcsr == 0xffff) > > + return false; > > > > /* Clear PME status. */ > > pmcsr |= PCI_PM_CTRL_PME_STATUS; > > > > Added to my 5.3 queue, thanks! Today when doing some PM testing I noticed that this patch actually reveals an issue in our native PME handling. Problem is in pcie_pme_handle_request() where we first convert req_id to struct pci_dev and then call pci_check_pme_status() for it. Now, when a device triggers wake the link is first brought up and then the PME is sent to root complex with req_id matching the originating device. However, if there are PCIe ports in the middle they may still be in D3 which means that pci_check_pme_status() returns 0xffff for the device below so there are lots of Spurious native interrupt" messages in the dmesg but the actual PME is never handled. It has been working because pci_check_pme_status() returned true in case of 0xffff as well and we went and runtime resumed to originating device. I think the correct way to handle this is actually drop the call to pci_check_pme_status() in pcie_pme_handle_request() because the whole idea of req_id in PME message is to allow the root complex and SW to identify the device without need to poll for the PME status bit.