Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1275006imm; Tue, 15 May 2018 17:06:53 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqA0W3vHWCsRnq5sQ4pI5HUMCJxsMNPDCJKMe514UhBow27X3F4ZEc+8elusBWB89tAPRi4 X-Received: by 2002:a63:934f:: with SMTP id w15-v6mr14285544pgm.333.1526429213491; Tue, 15 May 2018 17:06:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526429213; cv=none; d=google.com; s=arc-20160816; b=IzNpQuvuzmgHZH8DSUOMQBmpanfmWSDsaR05q4ZkeXcN4nHa/k0X9TC68iqOLUOPIJ hqLbSYwk09PmmtlL2/uf9QRna3zlbgNBfUq3dhtAma+xgtdVmywiXAvGoKSrJ7ax+TGw ecjG2QqhR8NZ4b78JaDRoCqsgnuq5V7wVxX/p+rZWOlbVtfPmzVBE87miYFaUdGqq+xf EDAbCne9LujE1IyEd7D74SNUfOYfDEubdJ3Bm8/EmhG8onpZRLjeUsxWvDm8uuEs7ytf DUthY0A5d55cuUf2kov25pnr7epDwtB36V9Bwc63/oEcrhMCSJIIT1FveDguSRJ31AEb JjOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=nkXBQNGZy2UEgzaXDecwAwnDv32aqAitFTYtFKRqUTo=; b=xr8GuzxqVPj8/yxruID3ZiHBiBLgx5HcEOylnZIcFmJRaOQMyZjfLUHeJVQXbphjGQ 2UDDT8uJfR7ZyT8zMrk+qBb7ZH58GtG9/tVJw5ASmaBm0lpa0YfYZ8AA7VbfvPwtqZCy AW6wXrc7WUUJTofDquHZLEatEWnCwBYMM1/gpLbxKOmrnhvgJ5q0H+vM/bFjXHOYUW3a sRfyCqR+JRFKxhbUCquHVpc9ij1vgU+0DjuccL4IkhHD8LD9jmewX8c5lUyj2/iccN9d FlhZvsNTJnpSfCAIZ1c2kAaPv4TbVDPwbKm+teXy8j9u4MswRpqYYrEKdAVieKTNpHVE dGTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=BOgFBCYF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f15-v6si1143829pln.359.2018.05.15.17.06.38; Tue, 15 May 2018 17:06:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=BOgFBCYF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752042AbeEPAG2 (ORCPT + 99 others); Tue, 15 May 2018 20:06:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:42394 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751402AbeEPAG0 (ORCPT ); Tue, 15 May 2018 20:06:26 -0400 Received: from localhost (unknown [69.71.5.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BCEF3204EE; Wed, 16 May 2018 00:06:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1526429186; bh=wMBcmdNvm+P1evXeRTQxZdg5jU6y7GbJPb+mdotj10o=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BOgFBCYFIgE1o9wYtiaOY42WDR3DLbVlG82okHzZb6yZPetpuabxHndrz4ZZnF6GW /BvnrBI/eCHoQhFxcyfniZdZVms1MFRejjEf56TV3BBHhKW8o7BqjZEzwzOWpIpqdV l273TofX0lQnij/gqp0YpqpFSpraiOqXRhrPonhU= Date: Tue, 15 May 2018 19:06:24 -0500 From: Bjorn Helgaas To: Oza Pawandeep Cc: Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Keith Busch , Wei Zhang , Sinan Kaya , Timur Tabi Subject: Re: [PATCH v16 5/9] PCI/AER: Factor out error reporting from AER Message-ID: <20180516000624.GD11156@bhelgaas-glaptop.roam.corp.google.com> References: <1526035408-31328-1-git-send-email-poza@codeaurora.org> <1526035408-31328-6-git-send-email-poza@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1526035408-31328-6-git-send-email-poza@codeaurora.org> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 11, 2018 at 06:43:24AM -0400, Oza Pawandeep wrote: > This patch factors out error reporting callbacks, which are currently > tightly coupled with AER. > > DPC should be able to register callbacks and attempt recovery when DPC > trigger event occurs. > > Signed-off-by: Oza Pawandeep > +static int report_error_detected(struct pci_dev *dev, void *data) > +{ > + pci_ers_result_t vote; > + const struct pci_error_handlers *err_handler; > + struct aer_broadcast_data *result_data; > + > + result_data = (struct aer_broadcast_data *) data; > + > + device_lock(&dev->dev); > + dev->error_state = result_data->state; > + > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->error_detected) { > + if (result_data->state == pci_channel_io_frozen && > + dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { > + /* > + * In case of fatal recovery, if one of down- > + * stream device has no driver. We might be > + * unable to recover because a later insmod > + * of a driver for this device is unaware of > + * its hw state. > + */ > + pci_printk(KERN_DEBUG, dev, "device has %s\n", > + dev->driver ? > + "no AER-aware driver" : "no driver"); > + } > + > + /* > + * If there's any device in the subtree that does not > + * have an error_detected callback, returning > + * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of > + * the subsequent mmio_enabled/slot_reset/resume > + * callbacks of "any" device in the subtree. All the > + * devices in the subtree are left in the error state > + * without recovery. > + */ > + > + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) > + vote = PCI_ERS_RESULT_NO_AER_DRIVER; > + else > + vote = PCI_ERS_RESULT_NONE; > + } else { > + err_handler = dev->driver->err_handler; > + vote = err_handler->error_detected(dev, result_data->state); > +#if defined(CONFIG_PCIEAER) > + pci_uevent_ers(dev, PCI_ERS_RESULT_NONE); > +#endif > +static int report_slot_reset(struct pci_dev *dev, void *data) > +{ > + pci_ers_result_t vote; > + const struct pci_error_handlers *err_handler; > + struct aer_broadcast_data *result_data; > + > + result_data = (struct aer_broadcast_data *) data; > + > + device_lock(&dev->dev); > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->slot_reset) > + goto out; > + > + err_handler = dev->driver->err_handler; > + vote = err_handler->slot_reset(dev); > + result_data->result = merge_result(result_data->result, vote); > +out: > + device_unlock(&dev->dev); > + return 0; > +} > + > +static int report_resume(struct pci_dev *dev, void *data) > +{ > + const struct pci_error_handlers *err_handler; > + > + device_lock(&dev->dev); > + dev->error_state = pci_channel_io_normal; > + > + if (!dev->driver || > + !dev->driver->err_handler || > + !dev->driver->err_handler->resume) > + goto out; > + > + err_handler = dev->driver->err_handler; > + err_handler->resume(dev); > +#if defined(CONFIG_PCIEAER) > + pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED); > +#endif > +void pcie_do_fatal_recovery(struct pci_dev *dev) > +{ > + struct pci_dev *udev; > + struct pci_bus *parent; > + struct pci_dev *pdev, *temp; > + pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED; > + struct aer_broadcast_data result_data; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > + udev = dev; > + else > + udev = dev->bus->self; > + > + parent = udev->subordinate; > + pci_lock_rescan_remove(); > + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > + bus_list) { > + pci_dev_get(pdev); > + pci_dev_set_disconnected(pdev, NULL); > + if (pci_has_subordinate(pdev)) > + pci_walk_bus(pdev->subordinate, > + pci_dev_set_disconnected, NULL); > + pci_stop_and_remove_bus_device(pdev); > + pci_dev_put(pdev); > + } > + > + result = reset_link(udev); > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + /* > + * If the error is reported by a bridge, we think this error > + * is related to the downstream link of the bridge, so we > + * do error recovery on all subordinates of the bridge instead > + * of the bridge and clear the error status of the bridge. > + */ > + pci_walk_bus(dev->subordinate, report_resume, &result_data); > + pci_cleanup_aer_uncorrect_error_status(dev); > + } > + > + if (result == PCI_ERS_RESULT_RECOVERED) { > + if (pcie_wait_for_link(udev, true)) > + pci_rescan_bus(udev->bus); > + pci_info(dev, "Device recovery successful\n"); > + } else { > +#if defined(CONFIG_PCIEAER) > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > +#endif I don't think this is the optimal resolution for this problem. It is true that we only call this function if either CONFIG_PCIEAER=y or CONFIG_PCIE_DPC=y and furthermore that CONFIG_PCIE_DPC depends on CONFIG_PCIEAER, so in either case, pci_uevent_ers() is present, since it is conditional on #if defined(CONFIG_PCIEAER) || defined(CONFIG_EEH) But the #ifdef here seems unnecessarily complicated. I think it would be better to change the #ifdef around the definition of pci_uevent_ers(). Then we wouldn't need the several #ifdefs in this file.