Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1202249imm; Tue, 3 Jul 2018 07:15:24 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKvoRle4zFonu+XpHHRyLWsK2c042vXLLsaJp1pp8lSh/Ig6GE7ddY8Wj2o1qnjBPDBP8BA X-Received: by 2002:a17:902:345:: with SMTP id 63-v6mr30712278pld.328.1530627324257; Tue, 03 Jul 2018 07:15:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530627324; cv=none; d=google.com; s=arc-20160816; b=jjQMhIRdm0mGPjbh2FiKt56ignKMoYgjALn55FnAyy9fvOe6NirVJAF3WTapyrKlcF p/3jWvK8/VJJ98l1YcZaUYl7lCFittT+gZW4g/NnaTvAiDuSN05G1Kxm0jA6ru4mNIKO Inu1f5vbr7UUjeRFOW54zHMmi4JhKaF/mH/vUA5t30l0bbBTdVnvMXE1PYZHQ6pYGfG2 1oAkdtEflNmUQyVH8iPLxjFV+VsfR6nvwR6DhCQ3f3pzdOrBwhLZiz/rbQ8LyLgzo4uU zERGRdK1FoGxSiD2fCjWk3FdzzqiRUVErxh/xHrzOGk0nth5oHf351XI0fUSpi112Mc2 YKqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=llpyH7uE0PITJ+seRQghBLDQXnn7BQg4GTV1RmICKjI=; b=RkLdY4/E+1LJLD3LgTgnQcVkS/2d/1hg9SunoS3tjFrfx6ai4e8sdgVJw4WmZ5W4Ew 1ZNAziEh1FuUgLq7bszCZPHSPzwstEweXHCrSt0u2J7yKBnTipz/DkC9GbVz0ey4ZD28 IQWstQ7L+ufCvRsGokDXbvZ0XFnJBiHW1FDPnzBbEo9dft6epSLvl5f3dviZuLSgODx9 UL+juTzA6ZVczY473//PJVpJ3r8OMv0968Ioc8iz6DG2iNrylXSvXaHOnPNC7WSIn+P+ PgoulFT5ujKnz/bzU0k8/R/jWO3s6GrNB9MMSKF22zMod/5mRSyi+6b0gF55Kb092uTY Ughw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 27-v6si1156626pgn.24.2018.07.03.07.15.09; Tue, 03 Jul 2018 07:15:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932516AbeGCOM7 (ORCPT + 99 others); Tue, 3 Jul 2018 10:12:59 -0400 Received: from bmailout2.hostsharing.net ([83.223.90.240]:56127 "EHLO bmailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932409AbeGCOM6 (ORCPT ); Tue, 3 Jul 2018 10:12:58 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 019D628008B1A; Tue, 3 Jul 2018 16:12:56 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 9CC0D1BCE; Tue, 3 Jul 2018 16:12:55 +0200 (CEST) Date: Tue, 3 Jul 2018 16:12:55 +0200 From: Lukas Wunner To: okaya@codeaurora.org Cc: linux-pci@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Bjorn Helgaas , Oza Pawandeep , Keith Busch , open list Subject: Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Message-ID: <20180703141255.GB18639@wunner.de> References: <1530571967-19099-1-git-send-email-okaya@codeaurora.org> <1530571967-19099-4-git-send-email-okaya@codeaurora.org> <20180703083447.GA2689@wunner.de> <8b6ce0f415858463d1c0588c29e30415@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8b6ce0f415858463d1c0588c29e30415@codeaurora.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote: > On 2018-07-03 04:34, Lukas Wunner wrote: > >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: > >>If a bridge supports hotplug and observes a PCIe fatal error, the > >>following > >>events happen: > >> > >>1. AER driver removes the devices from PCI tree on fatal error > >>2. AER driver brings down the link by issuing a secondary bus reset > >>waits > >>for the link to come up. > >>3. Hotplug driver observes a link down interrupt > >>4. Hotplug driver tries to remove the devices waiting for the rescan > >>lock > >>but devices are already removed by the AER driver and AER driver is > >>waiting > >>for the link to come back up. > >>5. AER driver tries to re-enumerate devices after polling for the link > >>state to go up. > >>6. Hotplug driver obtains the lock and tries to remove the devices > >>again. > >> > >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before > >>the > >>reset and unmask afterwards. > > > >Would it work for you if you just amended the AER driver to skip > >removal and re-enumeration of devices if the port is a hotplug bridge? > >Just check for is_hotplug_bridge in struct pci_dev. > > The reason why we want to remove devices before secondary bus reset is to > quiesce pcie bus traffic before issuing a reset. > > Skipping this step might cause transactions to be lost in the middle of the > reset as there will be active traffic flowing and drivers will suddenly > start reading ffs. Interesting, I think that merits a code comment. FWIW, macOS has a "PCI pause" callback to quiesce a device: https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf They're using it to reconfigure a device's BAR and bus number at runtime (sic!), e.g. if mmio windows need to be moved around on Thunderbolt hotplug if there's insufficient space: "During pause reconfiguration, the following may be changed: - device BAR registers - the devices bus number - registry properties reflecting these values ("ranges", "assigned-addresses", "reg") - device MSI block values for address and value, but not the number of MSIs allocated" Conceptually, "PCI pause" is similar to putting the device in a suspend state. I'm wondering if suspending the devices below the bridge would make more sense than removing them in the AER driver. Lukas