Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2945482imm; Thu, 24 May 2018 19:37:43 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpORQizYuw9z7QCmj2a5IHiQD9vxpMooXna/rEbKej8ulT7FrPEjz7TL6TuqoJOATIggy58 X-Received: by 2002:a17:902:9a8a:: with SMTP id w10-v6mr584379plp.333.1527215863349; Thu, 24 May 2018 19:37:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527215863; cv=none; d=google.com; s=arc-20160816; b=WpR/Qf0OLYtFP41l24/d1o4wgb7LDHUppLbeLFCI5y3YTzfg5+52f1AQ5PDrx21JmV G3Dzag1Oh/0EPwn41dIe2bXM7XMnJFh3EMIXPw1mqQeAZALrhHiYIpR/8Y6mipDqRTm0 Dx+fbmzcEaMuoFE5RAbj1goyTr/pYqazdmgb99NbDCQALZY8GXm5gr517wC4Ol+QVZdc OPRMtHpgU97/dRXKLlAjXJAMge/NIBkVBCuQCY0puimf4zv1KLISOzNSHAUzQcxcmuMo 4ROZeHTbuiK8ugHL788w3Pxfo/8KJBuEo4E7Fx25BQWfCloxOemtwBRnEAs2WO2Jsq1w 6rTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=fuxCRuxW0KFBk9dBKa5riTtImrFPw5xr3IdXdLhImDY=; b=V1CO8lXODfUqM0PuG+niyyS7KyOeDgymBgDn3BVwYUDiGT2pZ+eOwH4N8inIlFxw1j sz+iU5JJX1M3XgWQp1NlUokHelaMhfW3xqlTybJJnPFJxOJnM6dfGM31q8hGVgsS/+6K OuBPNGGiTIpUquuCUOZRvXyPsCGoVu7J93SqPaDiTrhnr8Wir7Q3P4Oywq3yBbbgSXfL G/uyw4dQ/J92Re+ZRTsJlSmYl4g6HN9nplPp1xyn3Q3HXIpF/GWxRhOx0rGjP0G/SO33 P+G6Yzcow0m2XzaXoOHHwgPVz5c3EbvVSGanXb5qH0pfy25Hc6SGXlqoPEjQpOm1KZDd Zdng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mmqTD/yJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s1-v6si21597578plr.332.2018.05.24.19.37.29; Thu, 24 May 2018 19:37:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mmqTD/yJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1032697AbeEXSfJ (ORCPT + 99 others); Thu, 24 May 2018 14:35:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:58732 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S968216AbeEXSfG (ORCPT ); Thu, 24 May 2018 14:35:06 -0400 Received: from localhost (unknown [69.71.5.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8DD1820848; Thu, 24 May 2018 18:35:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1527186905; bh=FITRqIe//MplqKCLO4+6nRXoWwc0g3oQO+OyvcqAzrc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mmqTD/yJQCkdFbv+x55VmeLesCT/ONFVGRu+jo9YJaIS1cY2ndUVAlTFoloVRqTXz ZIwJc1E51up76sxJw1In8DukXE9IvvBrTSnX4YVV6x+tSrWNHwuxKtQgSs2siXsEp2 KD7GP5v4ZMkEA1/u2VcLz9fmSK9N/oogvnWLf2Kk= Date: Thu, 24 May 2018 13:35:02 -0500 From: Bjorn Helgaas To: Sinan Kaya Cc: linux-pci@vger.kernel.org, timur@codeaurora.org, ryan@finnie.org, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, stable@vger.kernel.org, Bjorn Helgaas , "Rafael J. Wysocki" , Greg Kroah-Hartman , Thomas Gleixner , Kate Stewart , Frederick Lawler , Dongdong Liu , Mika Westerberg , open list , Don Brace , esc.storagedev@microsemi.com, linux-scsi@vger.kernel.org Subject: Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown Message-ID: <20180524183502.GB85822@bhelgaas-glaptop.roam.corp.google.com> References: <1527043490-17268-1-git-send-email-okaya@codeaurora.org> <20180523213249.GD150632@bhelgaas-glaptop.roam.corp.google.com> <61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 23, 2018 at 06:57:18PM -0400, Sinan Kaya wrote: > On 5/23/2018 5:32 PM, Bjorn Helgaas wrote: > > > > The crash seems to indicate that the hpsa device attempted a DMA after > > we cleared the Root Port's PCI_COMMAND_MASTER, which means > > hpsa_shutdown() didn't stop DMA from the device (it looks like *most* > > shutdown methods don't disable device DMA, so it's in good company). > > All drivers are expected to shutdown DMA and interrupts in their shutdown() > routines. They can skip removing threads, data structures etc. but DMA and > interrupt disabling are required. This is the difference between shutdown() > and remove() callbacks. > > If you see that this is not being done in HPSA, then that is where the > bugfix should be. > > Counter argument is that if shutdown() is not implemented, at least remove() > should be called. Expecting all drivers to implement shutdown() callbacks > is just bad by design in my opinion. > > Code should have fallen back to remove() if shutdown() doesn't exist. > I can propose a patch for this but this is yet another story to chase. That sounds like a reasonable idea, and it is definitely another can of worms. I looked briefly at some of the .shutdown() cases: - device_shutdown() doesn't fall back to remove(). - It looks like most bus_types don't implement .shutdown() at all (I didn't look at them all). - Of the bus_types that do implement .shutdown(), most do not fall back to .remove(). ps3_system_bus_type() is an example of one that *does* fall back to a driver's .remove() if there is no .shutdown(). Implement shutdown (no fallback unless indicated): ecard_bus_type gio_bus_type ps3_system_bus_type # does fallback to remove ibmebus_bus_type isa_bus_type platform_bus_type # not direct implementation fmc_bus_type # fmc_shutdown() looks spurious mipi_dsi_bus_type hv_bus > >> This has been found to cause crashes on HP DL360 Gen9 machines during > >> reboot. Besides, kexec is already clearing the bus master bit in > >> pci_device_shutdown() after all PCI drivers are removed. > > > > The original path was: > > > > pci_device_shutdown(hpsa) > > drv->shutdown > > hpsa_shutdown # hpsa_pci_driver.shutdown > > ... > > pci_device_shutdown(RP) # root port > > drv->shutdown > > pcie_portdrv_remove # pcie_portdriver.shutdown > > pcie_port_device_remove > > pci_disable_device > > do_pci_disable_device > > # clear RP PCI_COMMAND_MASTER > > if (kexec) > > pci_clear_master(RP) > > # clear RP PCI_COMMAND_MASTER > > > > If I understand correctly, the new path after this patch is: > > > > pci_device_shutdown(hpsa) > > drv->shutdown > > hpsa_shutdown # hpsa_pci_driver.shutdown > > ... > > pci_device_shutdown(RP) # root port > > drv->shutdown > > pcie_portdrv_shutdown # pcie_portdriver.shutdown > > __pcie_portdrv_remove(RP, false) > > pcie_port_device_remove(RP, false) > > # do NOT clear RP PCI_COMMAND_MASTER > > yup > > > if (kexec) > > pci_clear_master(RP) > > # clear RP PCI_COMMAND_MASTER > > > > I guess this patch avoids the panic during reboot because we're not in > > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root > > Port, so the hpsa device can DMA happily until the lights go out. > > > > But DMA continuing for some random amount of time before the reboot or > > shutdown happens makes me a little queasy. That doesn't sound safe. > > The more I think about this, the more confused I get. What am I > > missing? > > see above. > > > > >> Just remove the extra clear in shutdown path by seperating the remove and > >> shutdown APIs in the PORTDRV. > >> > >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { > >> > >> .probe = pcie_portdrv_probe, > >> .remove = pcie_portdrv_remove, > >> - .shutdown = pcie_portdrv_remove, > >> + .shutdown = pcie_portdrv_shutdown, > > > > What are the circumstances when we call .remove() vs .shutdown()? > > > > I guess the main (maybe only) way to call .remove() is to hot-remove > > the port? And .shutdown() is basically used in the reboot and kexec > > paths? > > Correct. shutdown() is only called during reboot/shutdown calls. If you echo > 1 into the remove file, remove() gets called. Handy for hotplug use cases. > It needs to be the exact opposite of the probe. It needs to clean up resources > etc. and have the HW in a state where it can be reinitialized via probe again. > > > > >> .err_handler = &pcie_portdrv_err_handler, > >> > >> -- > >> 2.7.4 > >> > > > > > -- > Sinan Kaya > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.