Received: by 10.213.65.68 with SMTP id h4csp119066imn; Mon, 12 Mar 2018 20:51:27 -0700 (PDT) X-Google-Smtp-Source: AG47ELtNjRQsAKoApz7YVQbvZCu2ETnQrtc2ucchT46+nMouh5+fxJdyhRxGcB1IiS53+kygMsT8 X-Received: by 10.99.108.129 with SMTP id h123mr8774215pgc.30.1520913087451; Mon, 12 Mar 2018 20:51:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520913087; cv=none; d=google.com; s=arc-20160816; b=HXaMA50nvz2Mniodm9J6nYFzcgm4hY3hfMKphc/+XK0DwSPFdzzORv5BHnmowb7ZbW x6MWZv7BunxmjQpxFlAZsMxhftVnwLPYUAwFjuHBwzeiGbAlw7ACJArbgcZVKswk1aoU NupdAbMbe79nFM+a0ZPWsh+Il0fBKguJnhPM1yymDWg38JMdtDRpqD8ib1WRu1+5lcQ5 KNODekOr3tYZawLXTjYcP2CHRicGUm2xxaaYZ5VkPw/50fgrxLIpu7pUSYc4AraOraLE hBZpuatxeapD0bwTRp2foC6rJLMFgukDSyFuRgbxESAZvzZYwyBMSgKYkrFnmCM+Houa CaSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=/CJAMwUdzC7fJpqhECny4bf9ZdLbKetNjyLHh7K2eqY=; b=ikfAmK2X67UE7LOgQZnQ+bjcEMPcfUfLawHJrQ7pYIrVFp/gqreG3DjhIAKb5P3tl4 Pq+TPJWHkXQRFV+7B8rN50B5tygIZLlAzqgNhwZ78yKhESJFqluq3/mKblmidhXKRnpd 87PsZI2xELypZRFytHAR60ymZVuUVJa6gmSMmI57mukKFpcAZK4S+t6KxMF169fj+UyV 7lIceVu6nlFcut23+MZFJarBx1m/WuGyYizM4G1h77LahrAFg6OSXiwJ/gewL4r7BPU6 ReeG82bcKT9iAXqBPrDQSHMfSaEiecCFPaBGO1qKWCyVqup0B5We6PodIgkeKfnQLsQe L5Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=Lwq3qduW; dkim=pass header.i=@codeaurora.org header.s=default header.b=bmqtLxFn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t8si5984127pgc.273.2018.03.12.20.51.13; Mon, 12 Mar 2018 20:51:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=Lwq3qduW; dkim=pass header.i=@codeaurora.org header.s=default header.b=bmqtLxFn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932574AbeCMDrS (ORCPT + 99 others); Mon, 12 Mar 2018 23:47:18 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:56658 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932532AbeCMDrQ (ORCPT ); Mon, 12 Mar 2018 23:47:16 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 6A328608BF; Tue, 13 Mar 2018 03:47:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1520912836; bh=84HSBaDVKYIww9XxAVcAtKFm0e/tQpMQlQSa/s4+HUM=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=Lwq3qduWm6htV79JDIeD1dA+LpzRjhig7zG+juL4QRFgwDf7YQKP07VVow98p/2g0 HZhuxRSC9XoAb8XO4L2hK+NclsuMzneF/0aoQd470NJPqNRZeehQLIGXymWLavNygV b7wKrZgXUGu6y3TdEBt8Vaj1veJfPaqkAmhiFcKw= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [192.168.0.105] (cpe-174-109-247-98.nc.res.rr.com [174.109.247.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: okaya@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id B38A7606DB; Tue, 13 Mar 2018 03:47:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1520912835; bh=84HSBaDVKYIww9XxAVcAtKFm0e/tQpMQlQSa/s4+HUM=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=bmqtLxFnkbbnkMyrPKTXAUZSknYOojsGO4nAg87pFc8TptfxcXEwn5YC/SyvQCHDG 5lvZC1etLFKjxXvJDnEtXEUk+M3AP22pOPldO017SAjhDkLwaECyhNGAmzqjFvMb3/ 4o6mSxDF4zT/RYzkEzn2CNCZ4SaSRg1t064nDZks= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org B38A7606DB Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: [PATCH v12 0/6] Address error and recovery for AER and DPC To: Keith Busch , Bjorn Helgaas Cc: Oza Pawandeep , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Wei Zhang , Timur Tabi , Alex Williamson References: <1519837457-3596-1-git-send-email-poza@codeaurora.org> <20180311220337.GA194000@bhelgaas-glaptop.roam.corp.google.com> <04ade52e-d1ea-fe67-bb26-246621d159e6@codeaurora.org> <20180312142551.GB18494@localhost.localdomain> <20180312194236.GA12195@bhelgaas-glaptop.roam.corp.google.com> <20180312232626.GI18494@localhost.localdomain> From: Sinan Kaya Message-ID: Date: Mon, 12 Mar 2018 23:47:12 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180312232626.GI18494@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/12/2018 7:26 PM, Keith Busch wrote: > On Mon, Mar 12, 2018 at 02:47:30PM -0500, Bjorn Helgaas wrote: >> [+cc Alex] >> >> On Mon, Mar 12, 2018 at 08:25:51AM -0600, Keith Busch wrote: >>> On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote: >>>> On 3/11/2018 6:03 PM, Bjorn Helgaas wrote: >>>>> On Wed, Feb 28, 2018 at 10:34:11PM +0530, Oza Pawandeep wrote: >>>> >>>>> That difference has been there since the beginning of DPC, so it has >>>>> nothing to do with *this* series EXCEPT for the fact that it really >>>>> complicates the logic you're adding to reset_link() and >>>>> broadcast_error_message(). >>>>> >>>>> We ought to be able to simplify that somehow because the only real >>>>> difference between AER and DPC should be that DPC automatically >>>>> disables the link and AER does it in software. >>>> >>>> I agree this should be possible. Code execution path should be almost >>>> identical to fatal error case. >>>> >>>> Is there any reason why you went to stop driver path, Keith? >>> >>> The fact is the link is truly down during a DPC event. When the link >>> is enabled again, you don't know at that point if the device(s) on the >>> other side have changed. >> >> When DPC is triggered, the port takes the link down. When we handle >> an uncorrectable (nonfatal or fatal) AER event, software takes the >> link down. >> >> In both cases, devices on the other side are at least reset. Whenever >> the link goes down, it's conceivable the device could be replaced with >> a different one before the link comes back up. Is this why you remove >> and re-enumerate? (See tangent [1] below.) > > Yes. Truthfully, DPC events due to changing topologies was the motivating > use case when we initially developed this. We were also going for > simplicity (at least initially), and remove + re-enumerate seemed > safe without concerning this driver with other capability regsiters, or > coordinating with/depending on other drivers. For example, a successful > reset may depend on any particular driver calling pci_restore_state from > a good saved state. The spec is recommending code to use "Hotplug Surprise" to differentiate these two cases we are looking for. The use case Keith is looking for is for hotplug support. The case I and Oza are more interested is for error handling on platforms with no hotplug support. According to the spec, if "Hotplug Surprise" is set in slot capabilities, then hotplug driver handles link up and DPC driver doesn't interfere with its operation. Hotplug driver observes link up interrupt like it is doing today. When link up event is observed, hotplug driver will do the enumeration. If "Hotplug Surprise" bit is not set, it is the job of the DPC driver to bring up the link. I believe this path should follow the AER driver path as there is a very well defined error reporting and recovery framework in the code. The link comes back up automatically when DPC driver handles its interrupt very similar to what secondary bus reset does for AER. I don't believe there is a hotplug possibility under this condition since it is not supported to begin with. Should we plumb the "Hotplug Surprise" condition into the code to satisfy both cases and leave the error handling path according to this code series? > >> The point is that from the device's hardware perspective, these >> scenarios are the same (it sent a ERR_NONFATAL or ERR_FATAL message >> and it sees the link go down). I think we should make them the same >> on the software side, too: the driver should see the same callbacks, >> in the same order, whether we're doing AER or DPC. >> >> If removing and re-enumerating is the right thing for DPC, I think >> that means it's also the right thing for AER. >> -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.