Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3826706imm; Mon, 2 Jul 2018 06:20:12 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKO+cFvshHD30wtXiML0y1S2QPnGl67LcoeIje2d88cp+o6zjAWtvZ/O9+Jh4fVXg7J+VgK X-Received: by 2002:a63:186:: with SMTP id 128-v6mr21470296pgb.138.1530537612008; Mon, 02 Jul 2018 06:20:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530537611; cv=none; d=google.com; s=arc-20160816; b=cHxPNrOfkm2sH80CUiQhN2jWZIoXbeLci1WvlPwuthU8+/0rtKyFSu2jrDIogJkwEn 9bVhWq90ZbelVcAZPqCI7ufopygi1U6Ob68z+o2cLoe31D5qmDplbYBQgM2BZGVfamp+ Hwtsn3pb76+GCnqNEZSij5KKTyHJiGFuPgThykBdhXCi3L2WlZSMw/zlvKgM0Xk56tvE wXQfTjU4d973ns0gsdGQHDu7qExFkINtpItL1XnWUmu18bOohMXPKSjk4OcqtxIw2xyz fVgwvLvAyfcvBeOHabttALGyzIwPBIoyHtv7nZYsk6/icJm6EE1MoexbtThsPXWEPx7F 07cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=us50TaalJwT06pmhVS7aVzIBTAvBMJRjokI1xTy+1co=; b=e6AdO5Ep6LGoju+S3APKODcD/aqaFGTpCR+Oa9vzBIe3FuuD6zT6o9TVDxZiZQjwm6 LqEY09Qjkr2q2aHuGE10kn3EILOI6C1MWjfwe1sCqhstFU15xLjfZPeVdz1tPiecLslM 58uewiwczmBdlVlJxjdwIwm0T2lYQ1yQbycXYTTY2tF3hicRZw/7bXiNPqfR+T9UBHNB 7cTflO4IjFWem98Bf+CPoYCkJnT/g+oSG3N7xnfwonYrRsz6JIdgZwWOkm7R37xDdkLo Ov2vxCgO7po36OjrAaiFpcfRc5aKaS/upVqNqfhJ0UMS65JRMO+dJ8A2oXcjw48Xhps3 MGoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Cy8CO8Vf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k11-v6si4018018pgm.154.2018.07.02.06.19.56; Mon, 02 Jul 2018 06:20:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Cy8CO8Vf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752342AbeGBNQv (ORCPT + 99 others); Mon, 2 Jul 2018 09:16:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:57592 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752042AbeGBNQr (ORCPT ); Mon, 2 Jul 2018 09:16:47 -0400 Received: from localhost (173-25-171-118.client.mchsi.com [173.25.171.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BE07F2554C; Mon, 2 Jul 2018 13:16:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1530537407; bh=EQtQ5zjO0PYyLcXBoFXZwtuz6izdEp0IvtpFf7Zzb1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Cy8CO8VfYcW1s9aSjSkyRzWBRR2Hy3Dfz8pdfa6DIWycTNcwCMlhWJLB0nbk9cBhw sdCBXOTOYqYWIpluzdHCWRBwCHzxarZy6i/qJgcPImshccw4JGcjiSs0PW9dVVi5zf XGjjHeoqeZ8hJscl9Q2vuy9cIhK+iLURGpKlOBi8= Date: Mon, 2 Jul 2018 08:16:45 -0500 From: Bjorn Helgaas To: Alex G Cc: bhelgaas@google.com, keith.busch@intel.com, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, Frederick Lawler , Greg Kroah-Hartman , Oza Pawandeep , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Borislav Petkov Subject: Re: [PATCH v2] PCI/AER: Fix aerdrv loading with "pcie_ports=native" parameter Message-ID: <20180702131645.GA15983@bhelgaas-glaptop.roam.corp.google.com> References: <20180619195835.5423-1-mr.nuke.me@gmail.com> <20180630213140.GG9547@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 30, 2018 at 11:39:00PM -0500, Alex G wrote: > On 06/30/2018 04:31 PM, Bjorn Helgaas wrote: > > [+cc Borislav, linux-acpi, since this involves APEI/HEST] > > Borislav is not the relevant maintainer here, since we're not contingent on > APEI handling. I think Keith has a lot more experience with this part of the > kernel. Thanks for adding Keith. > > On Tue, Jun 19, 2018 at 02:58:20PM -0500, Alexandru Gagniuc wrote: > > > According to the documentation, "pcie_ports=native", linux should use > > > native AER and DPC services. While that is true for the _OSC method > > > parsing, this is not the only place that is checked. Should the HEST > > > table list PCIe ports as firmware-first, linux will not use native > > > services. > > > > Nothing in ACPI-land looks at pcie_ports_native. How should ACPI > > things work in the "pcie_ports=native" case? I guess we still have to > > expect to receive error records from the firmware, because it may > > certainly send us non-PCI errors (machine checks, etc) and maybe even > > some PCI errors (even if the Linux AER driver claims AER interrupts, > > we don't know what notification mechanisms the firmware may be using). > > I think ACPI land shouldn't care about this. We care about it from the PCIe > stand point at the interface with ACPI. FW might see a delta in the sense > that we request control of some features via _OSC, which we otherwise would > not do without pcie_ports=native. > > > I guess best-case, we'll get ACPI error records for all non-PCI > > things, and the Linux AER driver will see all the AER errors. > > It might affect FW's ability to catch errors, but that's dependent on the > root port implementation. > > > Worst-case, I don't really know what to expect. Duplicate reporting > > of AER errors via firmware and Linux AER driver? Some kind of > > confusion about who acknowledges and clears them? > > Once user enters pcie_ports=native, all bets are off: you broke the contract > you have with the FW -- whether or not you have this patch. > > > Out of curiosity, what is your use case for "pcie_ports=native"? > > Presumably there's something that works better when using it, and > > things work even *better* with this patch? > > Corectness. It bothers me that actual behavior does not match the > documentation: > > native Use native PCIe services associated with PCIe ports > unconditionally. > > > > I know people do use it, because I often see it mentioned in forums > > and bug reports, but I really don't expect it to work very well > > because we're ignoring the usage model the firmware is designed > > around. My unproven suspicion is that most uses are in the black > > magic category of "there's a bug here, and we don't know how to fix > > it, but pcie_ports=native makes it work better". > > There exist cases that firmware didn't consider. I would not call them > "firmware bugs", but there are cases where the user understands the platform > better than firmware. > Example: on certain PCIe switches, a hardware PCIe error may bring the > switch downstream ports into a state where they stop notifying hotplug > events. Depending on the platform, firmware may or may not fix this > condition, but "pcie_ports=native" enables DPC. DPC contains the error > without the switch downstream port entering the weird error state in the > first place. > > All bets are off at this point. If a user needs "pcie_ports=native", I claim that's a user experience problem, and the underlying cause is a hardware, firmware, or OS defect. I have no doubt the situation you describe is real, but this doesn't make any progress toward resolving the user experience problem. In fact, it propagates the folklore that using "pcie_ports=native" is an appropriate final solution. It's fine as a temporary workaround while we figure out a better solution, but we need some mechanism for analyzing the problem and eventually removing the need to use "pcie_ports=native". I have a minor comment on the patch, but I think it makes sense. This might be a good time to resurrect Prarit's "taint-on-pci-parameters" patch. If somebody uses "pcie_ports=native", I think it makes sense to taint the kernel both because (1) we broke the contract with the firmware and we don't really know what to expect, and (2) it's an opportunity to encourage the user to raise a bug report. Bjorn