Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1413111imm; Wed, 23 May 2018 15:57:55 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq1bBe92sEAS6550z3ZG4Zg+bYBcd/yuT3aUMBe1xInvKSpI5Lcz0OLbNHR9zy2t8Nyj/Ik X-Received: by 2002:a62:399c:: with SMTP id u28-v6mr4669673pfj.95.1527116275488; Wed, 23 May 2018 15:57:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527116275; cv=none; d=google.com; s=arc-20160816; b=OrlZ0ZR0SYD7XfhyAQk4kGmY4rd70NtBcA+4ZI84i6MxfCHjDbP3LC13RYp4Mo5sia +ovNYSfbUUqXfRte9CCXuaBDB0FJhoznZ01olkcW8CvJm55sSIoljR00wjY4cjKzEGAx 1a/B4rmq4hpu2eTOAROUaBSv0WpBybMlN2cl+Txvl/Fy+m6Aw0Mc7taJOROCLAZGmUiS EX8x/FJWcfsGgVOuwoOXdo1neSn7MXijpcdsfRUX7KBQP8WzpodDoyvrxBnUwwMxOxSJ 0xV/PCzzIG0YZE2bEe66xUeaMfsrfmfSftV8KR+NBlzWVMnIYxJ5LND6hnpVqaNXlPMJ sd8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=H6OQ1h51K420yHqAE9pPDhnrst9JZs25/VTTXsDxrqI=; b=mCrplATfW+1ZxRgnVJLY1cOj2jqQofFRAK71I2ZXEo01dQY7XQ3uP3+mZZ0sXvWL07 obudZ2rXBh8qzjSEcrJjCQMAyesBp82OJL678KDX0IAvhSHqmAiWlRN+XxbDWtj9GC+M gXiisoNb8jE8zYPUKE/KWL8+yf9n50Zmmuc6h9RJsSf9XRsYXX+FTP4I8Hw5H6bXJZoe Wc/Ass9/mODcIC4gi/5VTmxWfiyxx46i+jyymZ+IWhl2x6eG3+L9JqWSJCyKniA5dyzM 382RmcPXDgs1X7aan5CIrabeWrZy9C6YBTT+20wkzzKKoB34rlU1XdCdq3zTnYqMwWOX r/+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kDu+x03+; dkim=pass header.i=@codeaurora.org header.s=default header.b=A9CQ606S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l16-v6si19023373pff.270.2018.05.23.15.57.40; Wed, 23 May 2018 15:57:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kDu+x03+; dkim=pass header.i=@codeaurora.org header.s=default header.b=A9CQ606S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935030AbeEWW50 (ORCPT + 99 others); Wed, 23 May 2018 18:57:26 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:55854 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934544AbeEWW5X (ORCPT ); Wed, 23 May 2018 18:57:23 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 2B00F60AFB; Wed, 23 May 2018 22:57:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1527116243; bh=mCQbdprNZX++8kHrMILC8mIcsO+CZ6e4qh/MhFOY6Go=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=kDu+x03+ypOJKqG4c3YckIAzqTjffutJMcgUQnybaBulsYEQB88j7p5KMAXebENSt lJxiRIfTdB7/RPlWJ2fnyO+y/9l9LTaarSOfw+YPnqNjtCF+DPDAFX8IwYI7uVE7FN yJv3oRQRJ+XiEx+EdvWzCLDpR4vgMLXDsphBAHdE= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [192.168.0.106] (cpe-174-109-247-98.nc.res.rr.com [174.109.247.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: okaya@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 87D966047C; Wed, 23 May 2018 22:57:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1527116241; bh=mCQbdprNZX++8kHrMILC8mIcsO+CZ6e4qh/MhFOY6Go=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=A9CQ606SqkYU0r87CYgC6Zf+W0NIDqz8b6Ck4gufLGJtNJvl9v5erX6+Hv1kHYXmj H+xAuK5x6Vhxpu8IuHXFdGKZvjOF4UByf9zzkOnhGTt7Ys3XN0beXM/63gKJWlu4gV OwgRyrIevp7lI0xGH/BNjj7B8otXiJL04KjMXYpc= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 87D966047C Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown To: Bjorn Helgaas Cc: linux-pci@vger.kernel.org, timur@codeaurora.org, ryan@finnie.org, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, stable@vger.kernel.org, Bjorn Helgaas , "Rafael J. Wysocki" , Greg Kroah-Hartman , Thomas Gleixner , Kate Stewart , Frederick Lawler , Dongdong Liu , Mika Westerberg , open list , Don Brace , esc.storagedev@microsemi.com, linux-scsi@vger.kernel.org References: <1527043490-17268-1-git-send-email-okaya@codeaurora.org> <20180523213249.GD150632@bhelgaas-glaptop.roam.corp.google.com> From: Sinan Kaya Message-ID: <61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org> Date: Wed, 23 May 2018 18:57:18 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180523213249.GD150632@bhelgaas-glaptop.roam.corp.google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/23/2018 5:32 PM, Bjorn Helgaas wrote: > > The crash seems to indicate that the hpsa device attempted a DMA after > we cleared the Root Port's PCI_COMMAND_MASTER, which means > hpsa_shutdown() didn't stop DMA from the device (it looks like *most* > shutdown methods don't disable device DMA, so it's in good company). All drivers are expected to shutdown DMA and interrupts in their shutdown() routines. They can skip removing threads, data structures etc. but DMA and interrupt disabling are required. This is the difference between shutdown() and remove() callbacks. If you see that this is not being done in HPSA, then that is where the bugfix should be. Counter argument is that if shutdown() is not implemented, at least remove() should be called. Expecting all drivers to implement shutdown() callbacks is just bad by design in my opinion. Code should have fallen back to remove() if shutdown() doesn't exist. I can propose a patch for this but this is yet another story to chase. > >> This has been found to cause crashes on HP DL360 Gen9 machines during >> reboot. Besides, kexec is already clearing the bus master bit in >> pci_device_shutdown() after all PCI drivers are removed. > > The original path was: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_remove # pcie_portdriver.shutdown > pcie_port_device_remove > pci_disable_device > do_pci_disable_device > # clear RP PCI_COMMAND_MASTER > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > If I understand correctly, the new path after this patch is: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_shutdown # pcie_portdriver.shutdown > __pcie_portdrv_remove(RP, false) > pcie_port_device_remove(RP, false) > # do NOT clear RP PCI_COMMAND_MASTER yup > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > I guess this patch avoids the panic during reboot because we're not in > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root > Port, so the hpsa device can DMA happily until the lights go out. > > But DMA continuing for some random amount of time before the reboot or > shutdown happens makes me a little queasy. That doesn't sound safe. > The more I think about this, the more confused I get. What am I > missing? see above. > >> Just remove the extra clear in shutdown path by seperating the remove and >> shutdown APIs in the PORTDRV. >> >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { >> >> .probe = pcie_portdrv_probe, >> .remove = pcie_portdrv_remove, >> - .shutdown = pcie_portdrv_remove, >> + .shutdown = pcie_portdrv_shutdown, > > What are the circumstances when we call .remove() vs .shutdown()? > > I guess the main (maybe only) way to call .remove() is to hot-remove > the port? And .shutdown() is basically used in the reboot and kexec > paths? Correct. shutdown() is only called during reboot/shutdown calls. If you echo 1 into the remove file, remove() gets called. Handy for hotplug use cases. It needs to be the exact opposite of the probe. It needs to clean up resources etc. and have the HW in a state where it can be reinitialized via probe again. > >> .err_handler = &pcie_portdrv_err_handler, >> >> -- >> 2.7.4 >> > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.