Received: by 10.192.165.156 with SMTP id m28csp2157080imm; Thu, 12 Apr 2018 09:28:54 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+O+al/2vKSHpqz/ywXNJOjZgyslMbmyNXqZPkFfTLDEFAcoXkdzxfzQnh8QJ+w2cDutOnb X-Received: by 10.99.127.88 with SMTP id p24mr1231336pgn.226.1523550534551; Thu, 12 Apr 2018 09:28:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523550534; cv=none; d=google.com; s=arc-20160816; b=oFMegRwrhW+yLDIuHtwAGUrLk7q0huD5GAEklR9RWMAGvsufFRvePmfm+2OcuVTNXA 9l4xVuSVe6edHVY1txZp7pa5wcYgvv0lDYDK6YMxpudSI6d9My9rcvSP71wrN/csuVQJ jNYlfbkGJ9muJFbMKRpscPR6GEkLJuIvW9eEJvXzXrUsgEG3YvaGXujrltTV5LrR7k2t ETEMc0+foGPm3/nTdkhUsCdEFYjuYJcw2qqSSrMqrJW5E7/Zo/IIW6xyKxHphDAeAE2P M/CK9iepWaIVqXnJRdFU6xjqS25mYiVWVnXFT95BPgd//J7vQaYOONsmWhxZjNHiBtCD /Jdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dmarc-filter :dkim-signature:dkim-signature:arc-authentication-results; bh=9RsHqJ8uj3XndlqrqVj7elaszxXobrVB1b76jTwhTI8=; b=WkyyjIDTiKpV1b6WMdFA1UC5FKcz6C1yC4IjzkfOJ0HiBwERNOTs6e/bDFkjxDCDm/ l8WFlnCFbBLH+ymghNfMWacohXpLICaECe8D9g2SGaVlwcBfaEVo1POSqT5PjVCIG/XL 1IxtmkrVE5hY/VTBT9q7kIFEsyMnI/c2QeDXU0cpATYJkD9d7TChytDpvyJl6sYtNy1G guNpLMcSL4dktu9IrOyufvXuS/MOuV6fKNixbnCKA+M+gWrtNueU1XPF2+gUEPuaxa56 q4jd7XRygiwJ/KPR0bNOHbmKCVZYomYNVF2y2GH0nS7DyVZyNNqrmD3vnqsqyxP1eCar BZ7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=LcCGltxS; dkim=pass header.i=@codeaurora.org header.s=default header.b=kPXdXxNx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 99-v6si3701634plc.601.2018.04.12.09.28.40; Thu, 12 Apr 2018 09:28:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=LcCGltxS; dkim=pass header.i=@codeaurora.org header.s=default header.b=kPXdXxNx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752777AbeDLQ10 (ORCPT + 99 others); Thu, 12 Apr 2018 12:27:26 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:58904 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752406AbeDLQ1Y (ORCPT ); Thu, 12 Apr 2018 12:27:24 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 5E54A60F90; Thu, 12 Apr 2018 16:27:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523550444; bh=T3LjQwHn4r1CeUqvQCTDjKfQitCRG/DQFZqF9oAFbBw=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=LcCGltxSBC22tFXYk9hKX2+EOVVqXnPp9NavyyzI2bPmnHF11GKw2bn/2ejCdtEmB z0g8uTePncyyDK4BRn70YDFWmfT/8WGR13xH+lWHe9agNlYsoafYgJMAgKliQhxHL3 6E9KRhljf0+MszlVt+St9RSoLma/p+jlmMIiEGzg= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from [10.235.228.150] (global_nat1_iad_fw.qualcomm.com [129.46.232.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: okaya@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id D4D9F6021A; Thu, 12 Apr 2018 16:27:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1523550443; bh=T3LjQwHn4r1CeUqvQCTDjKfQitCRG/DQFZqF9oAFbBw=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=kPXdXxNx00Qjm3aH2wl4qNIIN3LZeS2lRPi9RdMtOcqtfGB1DdNiAOGkxzfFq8XOn Nqp28Z1LskoS8PyfYO4jy8/5h4DABoS5NommRJ11s5mvyThuCr/t2uz134OZK8D0FH VCbSi7E1u0DGEYSXLtlx4GGnn1p0a9gIj8zbjfEc= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org D4D9F6021A Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system To: Keith Busch Cc: Bjorn Helgaas , Oza Pawandeep , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Wei Zhang , Timur Tabi , Alex Williamson References: <1523284914-2037-1-git-send-email-poza@codeaurora.org> <1523284914-2037-7-git-send-email-poza@codeaurora.org> <20180410210349.GG54986@bhelgaas-glaptop.roam.corp.google.com> <13efe2e8-74c8-acb4-ec58-f79b14a1f182@codeaurora.org> <20180412140648.GD145698@bhelgaas-glaptop.roam.corp.google.com> <20180412143954.GB4810@localhost.localdomain> <20180412150231.GD4810@localhost.localdomain> From: Sinan Kaya Message-ID: Date: Thu, 12 Apr 2018 12:27:20 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180412150231.GD4810@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/12/2018 11:02 AM, Keith Busch wrote: > On Thu, Apr 12, 2018 at 08:39:54AM -0600, Keith Busch wrote: >> On Thu, Apr 12, 2018 at 10:34:37AM -0400, Sinan Kaya wrote: >>> On 4/12/2018 10:06 AM, Bjorn Helgaas wrote: >>>> >>>> I think the scenario you are describing is two systems that are >>>> identical except that in the first, the endpoint is below a hotplug >>>> bridge, while in the second, it's below a non-hotplug bridge. There's >>>> no physical hotplug (no drive removed or inserted), and DPC is >>>> triggered in both systems. >>>> >>>> I suggest that DPC should be handled identically in both systems: >>>> >>>> - The PCI core should have the same view of the endpoint: it should >>>> be removed and re-added in both cases (or in neither case). >>>> >>>> - The endpoint itself should not be able to tell the difference: it >>>> should see a link down event, followed by a link retrain, followed >>>> by the same sequence of config accesses, etc. >>>> >>>> - The endpoint driver should not be able to tell the difference, >>>> i.e., we should be calling the same pci_error_handlers callbacks >>>> in both cases. >>>> >>>> It's true that in the non-hotplug system, pciehp probably won't start >>>> re-enumeration, so we might need an alternate path to trigger that. >>>> >>>> But that's not what we're doing in this patch. In this patch we're >>>> adding a much bigger difference: for hotplug bridges, we stop and >>>> remove the hierarchy below the bridge; for non-hotplug bridges, we do >>>> the AER-style flow of calling pci_error_handlers callbacks. >>> >>> Our approach on V12 was to go to AER style recovery for all DPC events >>> regardless of hotplug support or not. >>> >>> Keith was not comfortable with this approach. That's why, we special cased >>> hotplug. >>> >>> If we drop 6/6 on this patch on v13, we achieve this. We still have to >>> take care of Keith's inputs on individual patches. >>> >>> we have been struggling with the direction for a while. >>> >>> Keith, what do you think? >> >> My only concern was for existing production environments that use DPC >> for handling surprise removal, and I don't wish to break the existing >> uses. > > Also, I thought the plan was to keep hotplug and non-hotplug the same, > except for the very end: if not a hotplug bridge, initiate the rescan > automatically after releasing from containment, otherwise let pciehp > handle it when the link reactivates. > Hmm... AER driver doesn't do stop and rescan approach for fatal errors. AER driver makes an error callback followed by secondary bus reset and finally driver the resume callback on the endpoint only if link recovery is successful. Otherwise, AER driver bails out with recovery unsuccessful message. Why do we need an additional rescan in the DPC driver if the link is up and driver resumes operation? If hotplug is supported and somebody removed the device, link won't come up. The AER error recovery sequence will fail after timeout. When the drive is inserted, hotplug driver observes a link up interrupt, Hotplug driver does a rescan. Drive is functional one more time. This should satisfy both use cases, right? -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.