Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp5210056imm; Tue, 18 Sep 2018 06:08:44 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZAXjoqHsi33hzEYodZWrjY8gtts9lxSXHMtQrfx4PVY1YKWohaj4zBl7PapMbAxqttx5HZ X-Received: by 2002:a62:20d8:: with SMTP id m85-v6mr30776503pfj.74.1537276124630; Tue, 18 Sep 2018 06:08:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537276124; cv=none; d=google.com; s=arc-20160816; b=mv0G+mNSlR8qf9BDPHJQs8o3hmHZ95AcAu+FfET4jw3zgl3YJ1e+kNG04CpD8jWAAx ImJbSTvMVi9gpD0iTG58oZ3BLUXXeg1/Qh8PFM7F+VTzzas9oOMLKz20+65/DlQsgAbQ 3SM+Zh109OKtbzudIC1mmzM7/qvNtqlJuAzN8JXwrtNwgZqT/8CsuTpt58RyG5DKfzlc clnvCoE8la7npd+liGq/Cc8jDQvYPPkagt2xpaOukCxmz4/dDvjkTI2YgVN5nO7K1IYf 9rbBCo48cDCxCqcLOSESuaJkWzL8JDYb7tx0lwJXHX2NoZ2QUuath90P+praoXb9IzPf z/QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:references:message-id:date :thread-index:thread-topic:subject:to:from:cc:dkim-signature; bh=9X3DNRg5lguN3om4Q0tYmZ74pvRU/AFStVCmgqiubMo=; b=LzQkesGME+CFnoZTRiV3fcxc4+8og5vSo/UiX2sNU3hz/iBb+10GB9eyT2WbaykYE+ 9tbMqhCTBIs3WrQoZxgjXB7JaVSVbpM4FLSPJbd5mAiTaWTeb8uvcMsAo6M6+sCPzZlh U2SzW08ICyEju2G65MiNGujY2hThlwRMxVQ7w105rMreoBGkBJRK0y2tCT3YdPZMpJrd FNdpdfwfhEb6VLuHmRR9M9YL2Y849mfYET0vCN8kZl43w+Pq6OvX3rpVwHPjEIaRDFIu YOu+Iao9jezRwffVYhMiigkNeBb36EIQ+kjeonz4R4Ilcso7tJoitDD622R+2Q93ykjC IWdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@dellteam.com header.s=smtpout header.b=sM9sjopT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k18-v6si17450941pff.91.2018.09.18.06.08.15; Tue, 18 Sep 2018 06:08:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@dellteam.com header.s=smtpout header.b=sM9sjopT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729583AbeIRSkh (ORCPT + 99 others); Tue, 18 Sep 2018 14:40:37 -0400 Received: from esa3.dell-outbound.iphmx.com ([68.232.153.94]:20884 "EHLO esa3.dell-outbound.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727207AbeIRSkg (ORCPT ); Tue, 18 Sep 2018 14:40:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dellteam.com; i=@dellteam.com; q=dns/txt; s=smtpout; t=1537276065; x=1568812065; h=cc:from:to:subject:date:message-id:references: content-transfer-encoding:mime-version; bh=Hm5GoymEOjt/oLg++PSaotiBRYzeh4BHKEvvGGISNcU=; b=sM9sjopTIPtlNii9ESv8JZnmYwutVRZr9prrr819NTnAxmmQgUyb5sli 7xN7cfKoawyMyIaMFKZK/7jqWVX73xVNuCwudn7ElnQ0guMj7m6t38GxB Kx0ehuOT0UjvgbEaGivbIMbc+xhCP4puigKNTBdBxcP9cLiLP8RbhMzgY M=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2EcAADt96BbhyWd50NbGwEBAQEDAQE?= =?us-ascii?q?BCQEBAYFQg1oSKAqLfV+NXIhejW4UgWYLhGyEIzQYAQMBAQIBAQIBAQIQAQE?= =?us-ascii?q?BCgsJCCkjDEIQAYFiIoJiAQEBAwESFRM/BQsCAQgYHhAhNgIEARIIEweCf4F?= =?us-ascii?q?qAw0ImSWJVwEBAYFoM4c3DYJPjQSEJIJWghAYhVgCiBmFT4VYiEIjLAkFiUG?= =?us-ascii?q?DRIMPH48UglGKB4drAgQCBAUCFIFCgg5wgzyCJQ4Jh06GSW8BhF6HSoEeAQE?= X-IPAS-Result: =?us-ascii?q?A2EcAADt96BbhyWd50NbGwEBAQEDAQEBCQEBAYFQg1oSK?= =?us-ascii?q?AqLfV+NXIhejW4UgWYLhGyEIzQYAQMBAQIBAQIBAQIQAQEBCgsJCCkjDEIQA?= =?us-ascii?q?YFiIoJiAQEBAwESFRM/BQsCAQgYHhAhNgIEARIIEweCf4FqAw0ImSWJVwEBA?= =?us-ascii?q?YFoM4c3DYJPjQSEJIJWghAYhVgCiBmFT4VYiEIjLAkFiUGDRIMPH48UglGKB?= =?us-ascii?q?4drAgQCBAUCFIFCgg5wgzyCJQ4Jh06GSW8BhF6HSoEeAQE?= Received: from mx0b-00154901.pphosted.com (HELO mx0a-00154901.pphosted.com) ([67.231.157.37]) by esa3.dell-outbound.iphmx.com with ESMTP/TLS/AES256-SHA256; 18 Sep 2018 08:07:44 -0500 Received: from pps.filterd (m0089484.ppops.net [127.0.0.1]) by mx0b-00154901.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w8ID33Th086173; Tue, 18 Sep 2018 09:08:03 -0400 Received: from esa4.dell-outbound2.iphmx.com (esa4.dell-outbound2.iphmx.com [68.232.154.98]) by mx0b-00154901.pphosted.com with ESMTP id 2mjtxsjav6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 18 Sep 2018 09:08:03 -0400 Cc: , , , , , , , , Received: from ausxipps301.us.dell.com ([143.166.148.223]) by esa4.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-SHA256; 18 Sep 2018 19:08:01 +0600 X-LoopCount0: from 10.166.134.89 X-IronPort-AV: E=Sophos;i="5.53,389,1531803600"; d="scan'208";a="231717807" From: To: , Subject: Re: [PATCH] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Thread-Topic: [PATCH] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Thread-Index: AQHUKEtdWZAK09POFEiUwzvn0dT1gA== Date: Tue, 18 Sep 2018 13:07:58 +0000 Message-ID: <5c7aa51ff9294b1aab51a3bda2416094@ausx13mps321.AMER.DELL.COM> References: <20180730212146.31909-1-mr.nuke.me@gmail.com> <20180912212831.GM118330@bhelgaas-glaptop.roam.corp.google.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.177.90.70] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-09-18_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=825 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809180132 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/12/2018 4:28 PM, Bjorn Helgaas wrote:=0A= > On Mon, Jul 30, 2018 at 04:21:44PM -0500, Alexandru Gagniuc wrote:=0A= >> When a PCI device is gone, we don't want to send IO to it if we can=0A= >> avoid it. We expose functionality via the irq_chip structure. As=0A= >> users of that structure may not know about the underlying PCI device,=0A= >> it's our responsibility to guard against removed devices.=0A= > =0A= > I'm pretty ambivalent about pci_dev_is_disconnected() in general, but=0A= > I think I'll take this, given a couple minor changelog clarifications:=0A= > =0A= >> irq_write_msi_msg is already guarded. pci_msi_(un)mask_irq are not.=0A= >> Guard them for completeness.=0A= > =0A= > By the irq_write_msi_msg() guard, I guess you mean this path:=0A= > =0A= > pci_msi_domain_write_msg # irq_chip.irq_write_msi_msg=0A= > __pci_write_msi_msg=0A= > if (dev->current_state !=3D PCI_D0 || pci_dev_is_disconnected(dev)= )=0A= > /* don't touch */=0A= =0A= Yes!=0A= =0A= > pci_msi_(un)mask_irq() may be irq_chip.irq_mask, .irq_unmask, etc=0A= > pointers. So these are parallel because they're all irq_chip function=0A= > pointers, but the changelog isn't (yet) parallel because it uses the=0A= > irq_chip pointer name for .irq_write_msi_msg but not for mask/unmask=0A= =0A= Good catch! I'll get this corrected.=0A= =0A= =0A= >> For example, surprise removal of a PCIe device triggers teardown. This= =0A= >> touches the irq_chips ops some point to disable the interrupts. I/O=0A= >> generated here can crash the system on machines with buggy firmware.=0A= >> Not triggering the IO in the first place eliminates the problem.=0A= > =0A= > It doesn't eliminate the problem completely because .irq_mask() and=0A= > .irq_unmask() may be called for reasons other than surprise removal,=0A= > and if a surprise removal happens after the pci_dev_is_disconnected()=0A= > check but before the readl(), we will still generate I/O to a device=0A= > that's gone. I'd be OK if you said it "reduces" the problem.=0A= =0A= That sounds reasonable.=0A= =0A= > One reason I'm ambivalent about pci_dev_is_disconnected() is that in=0A= > cases like this, it turns a reproducible problem into a very=0A= > hard-to-reproduce problem, which reduces the likelihood that the buggy=0A= > firmware will be fixed.=0A= =0A= If it manages to turn this into 99.999% territory, I'll be much happier. = =0A= I'd love to give you an academically correct solution, but I just don't =0A= see how, given how firmware-first philosophy is written.=0A= =0A= > Do you have information about known platforms with this buggy firmware=0A= > and the signature of the crash? If you do, it's always nice to be=0A= > able to connect a patch with the user-visible problem it fixes.=0A= =0A= From what I've heard, it won't be fixed. The number of changes needed =0A= would require re-qualifying the firmware. I'm told that's very hard to =0A= do on platforms that are shipping. I can reword this to say =0A= "firmware-first" instead of "buggy" since they are mostly synonymous.=0A= =0A= Alex=0A= =0A= >> Signed-off-by: Alexandru Gagniuc =0A= >> ---=0A= >>=0A= >> There's another patch by Lukas Wunner that is needed (not yet published)= =0A= >> in order to fully block IO on SURPRISE!!! removal. The existing code onl= y=0A= >> sets the PCI_DEV_DISCONNECTED bit in an unreasonably narrow set of=0A= >> circumstances. Lukas' patch fixes that.=0A= >>=0A= >> However, this change is otherwise fully independent, and enjoy!=0A= >>=0A= >> drivers/pci/msi.c | 3 +++=0A= >> 1 file changed, 3 insertions(+)=0A= >>=0A= >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c=0A= >> index 4d88afdfc843..5f47b5cb0401 100644=0A= >> --- a/drivers/pci/msi.c=0A= >> +++ b/drivers/pci/msi.c=0A= >> @@ -227,6 +227,9 @@ static void msi_set_mask_bit(struct irq_data *data, = u32 flag)=0A= >> {=0A= >> struct msi_desc *desc =3D irq_data_get_msi_desc(data);=0A= >> =0A= >> + if (pci_dev_is_disconnected(msi_desc_to_pci_dev(desc)))=0A= >> + return;=0A= >> +=0A= >> if (desc->msi_attrib.is_msix) {=0A= >> msix_mask_irq(desc, flag);=0A= >> readl(desc->mask_base); /* Flush write to device */=0A= >> -- =0A= >> 2.17.1=0A= >>=0A= > =0A= =0A=