Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3527759imm; Fri, 20 Jul 2018 20:00:15 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdKgxBFtsuU5qMQtHn9T0hpKslKmjc4C4llRgbeJWHGnPVUfzrzFg3Qz/Kp3e4TpMnLbToj X-Received: by 2002:a63:6383:: with SMTP id x125-v6mr4243700pgb.127.1532142015822; Fri, 20 Jul 2018 20:00:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532142015; cv=none; d=google.com; s=arc-20160816; b=EQSGJCbyIr2JPWeTOftJiSJHg60ugE1qpSqbXcEqMY4noiNCwIjIxxY2i8mh+CWWBy DnR4ZujyU4KAELHwQPWWa8KOu5d3Gfr4UQmVwU0IKmL0cIndA1erNUzuvdl7rsttMq+q zWWy/6TRQk+gp+Ya+JykwBoeS5mYQNJCBlq9YwRAhuNR66T2xsxc4JYQTBsrsKdVrGvm SvjRug3CtnBtRQ9gJ/1WceIbgt1rVuYBuukSnasNilN/rdVvKHwz+9Op9ykbCjtQZEY8 hnJQRXqIREG8opUptAf1OPeeoxVih0xAyD0Rr/VuQ1eAw8Su7lJ9aWIzqvHwzznftdyZ Mf1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=jdU9p0NeFj+h/nCDkZRyvavJz/uC8ijOwDv0kQMoPNg=; b=D5OIktxRFjb/QgN80+KGevXx7cEQTriFDkEno364UXJ6+C9DSrpf596FBNhRlLMqg1 EXwZs7X832tIyz97tur2G1PCAJqX48VmT7Vtc4Y7P/z2D/U1ZXKWVH+xQfkQoCbpxLPa CrB1pNO3UystP1Fc0ffj8P8/jzt2z95UePng/MNJJXG6l/qIzV4/7nK4oiVSB9KWIIec gZ/pAvd5zG5dzK+J1HSPcUwpXC/0sftDxFIWvUWf8oe6c57OW3WBOAGauYwuYdx4lc8h 26AM2RMeHO7ZxYH7ptRhiKu+RFw9TLZBy1cUKVbw89G8ZqvkubDy4KyX05/VdFkXoEav sQUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tVZVZyfI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k19-v6si3107836pgi.494.2018.07.20.19.59.47; Fri, 20 Jul 2018 20:00:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=tVZVZyfI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728153AbeGUDtY (ORCPT + 99 others); Fri, 20 Jul 2018 23:49:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:44208 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727631AbeGUDtX (ORCPT ); Fri, 20 Jul 2018 23:49:23 -0400 Received: from [172.20.7.115] (unknown [209.119.211.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 42EB020854; Sat, 21 Jul 2018 02:58:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1532141901; bh=d+0JO7v+n84pdQJLuVg7C711mLG8aJ4SoKap4RoADQo=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=tVZVZyfIxR1Q6j+OWtnkT830EtFEUP2Q9rL1R/ZLtLpn+xuwd7frVzRkW5hxm7Kjg SIkyIkw8OrD7L0F8kO2xBNyc3Z7IfEDpd+vgtEGGUAFRRSqyLideh0sopnKw9n+ChE F4cWgIBZetLmMYC3X6SY01b6+J1niLsnW7c9GW3g= Subject: Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset To: Bjorn Helgaas Cc: Lukas Wunner , Oza Pawandeep , linux-pci@vger.kernel.org, open list , Keith Busch , linux-arm-msm@vger.kernel.org, Bjorn Helgaas , linux-arm-kernel@lists.infradead.org References: <12fc8de5-ff03-cb00-52cb-25a43c71d03a@codeaurora.org> <20180708171418.GA11476@wunner.de> <20180709160008.GA1490@wunner.de> <20180720200123.GS128988@bhelgaas-glaptop.roam.corp.google.com> From: Sinan Kaya Message-ID: <2febe688-f973-5ff5-f61d-0451ad7d36ae@kernel.org> Date: Fri, 20 Jul 2018 19:58:20 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180720200123.GS128988@bhelgaas-glaptop.roam.corp.google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/20/2018 1:01 PM, Bjorn Helgaas wrote: > On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote: >> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner wrote: >>> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote: >>>> On 7/8/18, Lukas Wunner wrote: >>>>> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: >>>>>> My solution doesn't help if link down interrupt is observed >>>>>> before the AER or DPC services. >>>>> >>>>> If pciehp gets an interrupt quicker than dpc/aer, it will (at >>>>> least with my patches) remove all devices, check if the >>>>> presence bit is set, and if so, try to bring the slot up >>>>> again. >>>> >>>> Hotplug driver should only observe a link down interrupt. Link >>>> would come up in response to a secondary bus reset initiated by >>>> the AER driver. >>> >>> PCIe hotplug doesn't have separate Link Down and Link Up >>> interrupts, there is only a Link State *Changed* event. >>> >>>> Can you point me to the code that would bring up the link in hp >>>> code? >>> >>> I was referring to the situation with my recently posted pciehp >>> patches applied, in particular patch [21/32] ("PCI: pciehp: Become >>> resilient to missed events"): >>> https://patchwork.ozlabs.org/patch/930389/ >>> >>> When I get a presence or link changed event, I turn the slot off. >>> That includes removing all devices in the slot. Because even if >>> the slot is still occupied or link is up, there was definitely a >>> change and the safe behavior is to assume that the card in the >>> slot is now a different one than before. >> >> We do have a bit of mess unfortunately. Error handling and hotplug >> drivers do not play nicely with each other. >> >> When hotplug driver observes a link down, we are not checking if the >> link down happened because user really wanted to remove a card or if >> it was because it was originated by an error handling service such >> as AER/DPC. >> >> I'm thinking that we could potentially check if a hotplug event is >> pending at the entrance of fatal error handling. If it is pending, >> we could poll until the status bit clears. That should flush the >> link down event. >> >> Even then, link down indication of hotplug seem to turn off slot >> power and LED. >> >> If AER/DPC service runs after the hotplug driver, link won't come >> back up as the power to the slot is turned off. >> >> I'd like to hear about Bjorn's opinion before we throw something >> else into this problem. > > You guys know way more about this than I do. > > I think the separation of AER/DPC/pciehp into separate drivers is > somewhat artificial because there are many interdependencies. The > driver model doesn't apply very well because there's only one > underlying piece of hardware, which forces us to use the portdrv as > sort of a multiplexer. The fact that portdrv claims these bridges > also means normal drivers (e.g., for performance counters) can't use > the usual model. > > All that is to say that if integrating these services more tightly > would help solve this problem, I'd be open to that. I was looking at how to destroy the portdrv for a while. It looks like a much more bigger task to be honest. There are multiple levels of abstractions in the code as you highlighted. My patch solves the problem if AER interrupt happens before the hotplug interrupt. We are masking the data link layer active interrupt. So, AER/DPC can perform their link operations without hotplug driver race. We need to figure out how to gracefully return inside hotplug driver if link down happened and there is an error pending. My first question is why hotplug driver is reacting to the link event if there was not an actual device insertion/removal. Would it help to keep track of presence changed interrupts since last link event? IF counter is 0 and device is present, hotplug driver bails out silently as an example. > > Bjorn >