Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2715719pxk; Sun, 27 Sep 2020 19:47:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHVZg2OjLRNcmd4kjpBEEuzJkY0LVukCW/aoK9Ko2sDJg6xWzabizIJ5NJi/kSLrPhZYTu X-Received: by 2002:a17:907:7215:: with SMTP id dr21mr13220659ejc.239.1601261230958; Sun, 27 Sep 2020 19:47:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601261230; cv=none; d=google.com; s=arc-20160816; b=0BJ4+yGSzsAU+UI5UvAsLMm6KeAWnsgWbHkylFZAd1YXthZVDgNkJIjklTP8jQdP0C 35t2kTdPh71uN9xiFsZPE/m2Uaq8zvyGbyp4hk6gs/3woWwueAk9WSniy+5jJM0jCHl4 jbOSkt6QI25IgnpoCvDpMGEL4ESOppJdGI08jKStLk8084tShI7w++WGP//E0dCK+Uqy BJepsCaIck/GLyYaE8QKLrV2z3UJ2Gp5GA2TPuwbClemnxWJU/FqZlZzQpIBNhnMPjqH uTjZwpYX5nIF8U3YIIwdS1jgPPxVhZfqif3SCOtOJkLLox1nOnn2KUsbm0XMi1aPmy2g zxwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=HjDpSea4+sZHoTELmTgiFZ7Z9vv8bUUjl+7GV/0Ban4=; b=yd4W00r53G023WuaV5y7jiiRxaaNPy2N0ewSS38ZjEl3Jy5Nitw1OiqvH2I6n4hsoQ PFQjIdWgrbctLYnb8N09AyFxePgT9+Ef6HMqHS2FUqEXQHw/isu9uPvS5y8Gu2fN5PN2 Bz7Fdkwq4Yacd49VTzDAAAwU16JNPILlOi/opZeZu5BdwwRCOM6v7rsHNT12b7RrLUta lUJxMsqYdekiQEKeu5H7wRSvMAjH+uBpV2RneTS2TTvjpQWv/C2+2NRkRrpSZHIm1Xo3 +xg5V8OrhH3ZNFPTl7RvvHiByZhFda2cRq9HDYb/PyDXhdNwpRG3sIh8XXyow1FwqJwd 58Xg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o12si6709725eje.30.2020.09.27.19.46.48; Sun, 27 Sep 2020 19:47:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726558AbgI1Cnu (ORCPT + 99 others); Sun, 27 Sep 2020 22:43:50 -0400 Received: from mga06.intel.com ([134.134.136.31]:38323 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726526AbgI1Cnu (ORCPT ); Sun, 27 Sep 2020 22:43:50 -0400 IronPort-SDR: dxjrFLS65dVYK0Q6Aq8tIsp6iLa/xEls/trp5znuKcQyqHgqAnaebwOT75YPn9DFLg/7z0j2WN h7zuEXKzC/Gg== X-IronPort-AV: E=McAfee;i="6000,8403,9757"; a="223523004" X-IronPort-AV: E=Sophos;i="5.77,312,1596524400"; d="scan'208";a="223523004" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2020 19:43:49 -0700 IronPort-SDR: Cc7YDlVkjOiEgbw62VQp9xD+4Kr3Yb68DpZ5YsJ30Us447fJRmv197/aRCtmQ/7reR+O8jw+/l W5mbUDnZHRSQ== X-IronPort-AV: E=Sophos;i="5.77,312,1596524400"; d="scan'208";a="338003421" Received: from thuang10-mobl.amr.corp.intel.com (HELO [10.251.1.32]) ([10.251.1.32]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2020 19:43:48 -0700 Subject: Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call To: Sinan Kaya , Bjorn Helgaas Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, ashok.raj@intel.com, Jay Vosburgh References: <20200922233333.GA2239404@bjorn-Precision-5520> <704c39bf-6f0c-bba3-70b8-91de6a445e43@linux.intel.com> <3d27d0a4-2115-fa72-8990-a84910e4215f@kernel.org> <526dc846-b12b-3523-4995-966eb972ceb7@kernel.org> <1fdcc4a6-53b7-2b5f-8496-f0f09405f561@linux.intel.com> <95e23cb5-f6e1-b121-0de8-a2066d507d9c@linux.intel.com> <65238d0b-0a39-400a-3a18-4f68eb554538@kernel.org> <4ae86061-2182-bcf1-ebd7-485acf2d47b9@linux.intel.com> <8beca800-ffb5-c535-6d43-7e750cbf06d0@linux.intel.com> <44f0cac5-8deb-1169-eb6d-93ac4889fe7e@kernel.org> From: "Kuppuswamy, Sathyanarayanan" Message-ID: <3bc0fd23-8ddd-32c5-1dd9-4d5209ea68c3@linux.intel.com> Date: Sun, 27 Sep 2020 19:43:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <44f0cac5-8deb-1169-eb6d-93ac4889fe7e@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 9/25/20 11:30 AM, Sinan Kaya wrote: > On 9/25/2020 2:16 PM, Kuppuswamy, Sathyanarayanan wrote: >>> >>> If this is a too involved change, DPC driver should restore state >>> when hotplug is not supported. >> Yes. we can add a condition for hotplug capability check. >>> >>> DPC driver should be self-sufficient by itself. >>> > > Sounds good. > >>>> Also for non-fatal errors, if reset is requested then we still need >>>> some kind of bus reset call here >>> >>> DPC should handle both fatal and non-fatal cases >> Currently DPC is only triggered for FATAL errors. >>  and cause a bus reset > > Thanks for the heads up. > This seems to have changed since I looked at the DPC code. > >>> in hardware already before triggering an interrupt. >> Error recovery is not triggered only DPC driver. AER also uses the >> same error recovery code. If DPC is not supported, then we still need >> reset logic. > > It sounds like we are cross-talking two issues. > > 1. no state restore on DPC after FATAL error. > Let's fix this. Agree. Few more detail about the above issue is, There are two cases under FATAL error. FATAL + hotplug - In this case, link will be reseted. And hotplug handler will remove the driver state. This case works well with current code. FATAL + no-hotplug - In this case, link will still be reseted. But currently driver state is not properly restored. So I attempted to restore it using pci_reset_bus(). status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (status == PCI_ERS_RESULT_RECOVERED) { + status = PCI_ERS_RESULT_NEED_RESET; ... if (status == PCI_ERS_RESULT_NEED_RESET) { /* - * TODO: Should call platform-specific - * functions to reset slot before calling - * drivers' slot_reset callbacks? + * TODO: Optimize the call to pci_reset_bus() + * + * There are two components to pci_reset_bus(). + * + * 1. Do platform specific slot/bus reset. + * 2. Save/Restore all devices in the bus. + * + * For hotplug capable devices and fatal errors, + * device is already in reset state due to link + * reset. So repeating platform specific slot/bus + * reset via pci_reset_bus() call is redundant. So + * can optimize this logic and conditionally call + * pci_reset_bus(). */ + pci_reset_bus(dev); > > 2. no bus reset on NON_FATAL error through AER driver path. > This already tells me that you need to split your change into > multiple patches. > > Let's talk about this too. bus reset should be triggered via > AER driver before informing the recovery. But as per error recovery documentation, any call to ->error_detected() or ->mmio_enabled() can request PCI_ERS_RESULT_NEED_RESET. So we need to add code to do the actual reset before calling ->slot_reset() callback. So call to pci_reset_bus() fixes this issue. if (status == PCI_ERS_RESULT_NEED_RESET) { + pci_reset_bus(dev); > > if (status == PCI_ERS_RESULT_NEED_RESET) { > /* > * TODO: Should call platform-specific > * functions to reset slot before calling > * drivers' slot_reset callbacks? > */ > status = PCI_ERS_RESULT_RECOVERED; > pci_dbg(dev, "broadcast slot_reset message\n"); > pci_walk_bus(bus, report_slot_reset, &status); > } > -- Sathyanarayanan Kuppuswamy Linux Kernel Developer