Received: by 10.192.165.148 with SMTP id m20csp1084775imm; Fri, 27 Apr 2018 12:23:32 -0700 (PDT) X-Google-Smtp-Source: AB8JxZohYQa4L4ziNBlHO1Uem8JEEXB2kCb7vBpV2RKL7cNkKPYt2MK1TMYikanvhWyB8rkbnFKQ X-Received: by 10.167.133.71 with SMTP id y7mr3278644pfn.210.1524857012747; Fri, 27 Apr 2018 12:23:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524857012; cv=none; d=google.com; s=arc-20160816; b=W5e/PmH1I3KExyzmIk2vYRFqlei26V4hUbyoNIwoeHgS8CHlcblK2Ly99zNBZVG+si pyOuDmiN+bJHbsIOMZ6nOem8kkge7+kwyAZV5Qd8HI4kJScN8tARgYzgrsivFtQLMSoe bgiUOOpofoxbc/7tta/jVLEBVw2b1+QAdQPha6vbUZ6DzPgnJYaoKKsgTZbxT3Mvn/Jl PbWcyEyVjPZMt2F6T/wWvzv0m6tt2bf6J7d7edZ1haX5i0l+dWdRZFnVM+KGc8wjI/ZQ ovZphmCN6XARsSU07qm3cXLwOnA2KXnHPAv2PNrMGcgeGa7Cf8xb/Ds5npupa+nVtX/Z XXcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=6ffq90amK8/OesAXKuUJqBmB5nG6VjE4vUNOV5O23VA=; b=v8P/ZZuavKlwAs7eUhmbxlCjP2oelfx3XfRNgcopWarQcVtqu/7oCjFXcdVLwXQmjk VxNzEXsjWCe8H8YDVatIAs7MUeSfe34j6zctkROH3UkZ5FMOM3jEs/b7ndWmbXoHAHVU VtQv04erlc4XG+qkzF/z3bPeETVZl/ay5TqNDkmDpU/qrsQCxIiSQrtpFgL64UBT4D9D yNoIQycbDIvhrMOCX72nKQDe2LTLDp9YI2NWxy6NAJCq3Cdu6aIMBxFKeCZ8a9IMLdRN T7Mgn3WUgl4yE/lr8hMOooIo/KyWHYAkD7ATcfpaZGS4Hy4/py27qADQs5pnCyT1HQFJ jDnw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t126si1829657pfd.55.2018.04.27.12.23.17; Fri, 27 Apr 2018 12:23:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758502AbeD0TWL (ORCPT + 99 others); Fri, 27 Apr 2018 15:22:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:41402 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758069AbeD0TWJ (ORCPT ); Fri, 27 Apr 2018 15:22:09 -0400 Received: from localhost (173-17-104-225.client.mchsi.com [173.17.104.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D9F58218C9; Fri, 27 Apr 2018 19:22:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9F58218C9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Fri, 27 Apr 2018 14:22:07 -0500 From: Bjorn Helgaas To: Paul Menzel Cc: Bjorn Helgaas , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Lukas Wunner , Sinan Kaya Subject: Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Message-ID: <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com> References: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [+cc Lukas, Sinan] Hi Paul, Thanks for the report! On Thu, Apr 26, 2018 at 12:17:53PM +0200, Paul Menzel wrote: > Dear Linux folks, > > > On the Lenovo X60t, during resume from ACPI suspend and during shutdown, the > message below is shown in the logs. > > pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued > 65284 msec ago) This is an Intel root port: 00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 02) (prog-if 00 [Normal decode]) and probably has the CF118 erratum (see http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3461a068661c for details). I bet if you changed "msecs" in pcie_wait_cmd() to 30000 you'd see a 30 second delay during shutdown because we write a command to tell the port not to generate any more hotplug interrupts, and we wait for that command to complete, but the port never tells us it has completed. Lukas reported a similar issue in https://lkml.kernel.org/r/20180112104929.GA10599@wunner.de, which we sort of worked around by assuming that Thunderbolt controllers never support that "command complete" interrupt (see http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=493fb50e958c) Sinan mooted the idea of using a "no-wait" path of sending the "don't generate hotplug interrupts" command. I think we should work on this idea a little more. If we're shutting down the whole system, I can't believe there's much value in *anything* we do in the pciehp_remove() path. Maybe we should just get rid of pciehp_remove() (and probably pcie_port_remove_service() and the other service driver remove methods) completely. That dates from when the service drivers could be modules that could be potentially unloaded, but unloading them hasn't been possible for years. As far as the resume path, my guess is that in pciehp_resume(), we write a command to enable interrupts, then it looks like we get a PCI_EXP_SLTSTA_DLLSC "Link Up" interrupt, and apparently we issue another command. Not sure exactly what's going on here. Could you try the following patch? The idea is to (1) do nothing on shutdown, so you should see no message and no delay, and (2) collect more information about the resume path. diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index 332b723ff9e6..99751cc52968 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -260,14 +260,6 @@ static int pciehp_probe(struct pcie_device *dev) return -ENODEV; } -static void pciehp_remove(struct pcie_device *dev) -{ - struct controller *ctrl = get_service_data(dev); - - cleanup_slot(ctrl); - pciehp_release_ctrl(ctrl); -} - #ifdef CONFIG_PM static int pciehp_suspend(struct pcie_device *dev) { @@ -305,7 +297,6 @@ static struct pcie_port_service_driver hpdriver_portdrv = { .service = PCIE_PORT_SERVICE_HP, .probe = pciehp_probe, - .remove = pciehp_remove, #ifdef CONFIG_PM .suspend = pciehp_suspend, diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 18a42f8f5dc5..c3a9c47ed061 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -113,7 +113,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) return 0; /* timeout */ } -static void pcie_wait_cmd(struct controller *ctrl) +static void pcie_wait_cmd(struct controller *ctrl, u16 cmd, u16 mask) { unsigned int msecs = pciehp_poll_mode ? 2500 : 1000; unsigned long duration = msecs_to_jiffies(msecs); @@ -155,10 +155,13 @@ static void pcie_wait_cmd(struct controller *ctrl) * don't change those bits, e.g., commands that merely enable * interrupts. */ - if (!rc) - ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago)\n", + if (!rc) { + ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago), new command %#06x/mask %#06x\n", ctrl->slot_ctrl, - jiffies_to_msecs(jiffies - ctrl->cmd_started)); + jiffies_to_msecs(jiffies - ctrl->cmd_started), + cmd, mask); + dump_stack(); + } } static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, @@ -172,7 +175,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, /* * Always wait for any previous command that might still be in progress */ - pcie_wait_cmd(ctrl); + pcie_wait_cmd(ctrl, cmd, mask); pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl); if (slot_ctrl == (u16) ~0) { @@ -193,7 +196,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, * indicating completion of the above issued command. */ if (wait) - pcie_wait_cmd(ctrl); + pcie_wait_cmd(ctrl, cmd, mask); out: mutex_unlock(&ctrl->ctrl_lock);