Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2409533imm; Thu, 9 Aug 2018 12:23:46 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwoGL1n1NmQiXBDcf5RHEH9fS3Yk6mgFYBFSl+FHwif2HgUmU4W9KT7Jsrdi+NlnQaYwOcD X-Received: by 2002:a17:902:6105:: with SMTP id t5-v6mr3223488plj.92.1533842626313; Thu, 09 Aug 2018 12:23:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533842626; cv=none; d=google.com; s=arc-20160816; b=m8GxEDE2+aMunDUMHpidYr1j6PKrQqvYEufH4Q76K3Se220J4DC3214q3/GDNl44dD /RR/O0/K62cFYAl26yb7BlN71vkpD3YEQG6UNF3JHbYImY4NzH8xkFwCGqGtR6CWSUFg rsSd6586ICuDumK5ehmWRrGL+3Mvh+acJ4cSa0+7EZW4hNgMpApLD4iG4GPhgo7E2uYh qGX0x+Gwv2p2yfbtVbiTW914p1/ZWKS8qNfX+AmweGGJt49tTKB0STjrC97J+ZBnxxuu ZftKfYvtS0oCKR6MmwhePX5H95e2YW97NLw2S2rzbA0sXSGWDmCpIVLI/uJsJ3FkXOmb 39KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=gR2jZL8f+egXZMfdw3aT/hp33RkkxH7JePDPAPOC5uY=; b=gYFTajaV6Np5eioLgYXH2FC4tgyHI6W2zEhXvULE1bxeavmNEbaDnO87Yi+lTeIco5 QNCdrgZAfkzNJKOxGFAQNB8dl9l0eNptXcO30gRq9F+ECB/Jts5SMLANqlbRDSZumcxM Lh1KR8+5376IcOF9CKKSQov4GHxkxM3IOx6jdUb3M94XURmOUbKuS5wI4P7KO8XLplpb 5/UInR74JUpsV87sdUI4rskOFoN3/efUzHsaY+6BKkNCh8zAafBaj44tIfqVEvqytzaj BzgsB/6XHmqEjsAX5CX/grXS3mlBkUthR00hRX5m+OwYI/9BVabxXYb9ETLivN3ui+WG 8Mhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=y0z2NC55; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f27-v6si8081658pfk.97.2018.08.09.12.23.31; Thu, 09 Aug 2018 12:23:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=y0z2NC55; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727251AbeHIVoq (ORCPT + 99 others); Thu, 9 Aug 2018 17:44:46 -0400 Received: from mail.kernel.org ([198.145.29.99]:58942 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726893AbeHIVoq (ORCPT ); Thu, 9 Aug 2018 17:44:46 -0400 Received: from localhost (unknown [69.71.4.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3531C21EF7; Thu, 9 Aug 2018 19:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1533842313; bh=2oK23VnUlosCtJeNvW+VkfPdJLrgZyAlyQlcejLjHc8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=y0z2NC55e2p5SegVaREUbYmxwJ9VIfck5MDVBmeY6UTt/qQsarfk4OwKacewkIt6v Ye6C8EBwY2Hx3/fhWofS9t5epXgpB71d3i4GnDyX5cgHk6PyYRMRak+15g5lcldO56 Il5Xr8cr0TQMQd7GFSwAvhLehvxCxU8cjWFjSSp8= Date: Thu, 9 Aug 2018 14:18:32 -0500 From: Bjorn Helgaas To: "Alex G." Cc: Alex_Gagniuc@Dellteam.com, bhelgaas@google.com, keith.busch@intel.com, Austin.Bolen@dell.com, Shyam.Iyer@dell.com, fred@fredlawl.com, poza@codeaurora.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER Message-ID: <20180809191832.GC113140@bhelgaas-glaptop.roam.corp.google.com> References: <20180717153135.25925-1-mr.nuke.me@gmail.com> <20180809141551.GH49411@bhelgaas-glaptop.roam.corp.google.com> <2cae6a5ac8324be18b8dcf3d7dfcc288@ausx13mps321.AMER.DELL.COM> <20180809182905.GA113140@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 09, 2018 at 02:00:23PM -0500, Alex G. wrote: > On 08/09/2018 01:29 PM, Bjorn Helgaas wrote: > > On Thu, Aug 09, 2018 at 04:46:32PM +0000, Alex_Gagniuc@Dellteam.com wrote: > > > On 08/09/2018 09:16 AM, Bjorn Helgaas wrote: > (snip_ > > > > enable_ecrc_checking() > > > > disable_ecrc_checking() > > > > > > I don't immediately see how this would affect FFS, but the bits are part > > > of the AER capability structure. According to the FFS model, those would > > > be owned by FW, and we'd have to avoid touching them. > > > > Per ACPI v6.2, sec 18.3.2.4, the HEST may contain entries for Root > > Ports that contain the FIRMWARE_FIRST flag as well as values the OS is > > supposed to write to several AER capability registers. It looks like > > we currently ignore everything except the FIRMWARE_FIRST and GLOBAL > > flags (ACPI_HEST_FIRMWARE_FIRST and ACPI_HEST_GLOBAL in Linux). > > > > That seems like a pretty major screwup and more than I want to fix > > right now. > > The logic is not very clear, but I think it goes like this: > For GLOBAL and FFS, disable native AER everywhere. > When !GLOBAL and FFS, then only disable native AER for the root port > described by the HEST entry. I agree the code is convoluted, but that sounds right to me. What I meant is that we ignore the values the HEST entry tells us we're supposed to write to Device Control and the AER Uncorrectable Error Mask, Uncorrectable Error Severity, Correctable Error Mask, and AER Capabilities and Control. > > > For practical considerations this is not an issue today. The ACPI error > > > handling code currently crashes when it encounters any fatal error, so > > > we wouldn't hit this in the FFS case. > > > > I wasn't aware the firmware-first path was *that* broken. Are there > > problem reports for this? Is this a regression? > > It's been like this since, I believe, 3.10, and probably much earlier. All > reports that I have seen of linux crashing on surprise hot-plug have been > caused by the panic() call in the apei code. Dell BIOSes do an extreme > amount of work to determine when it's safe to _not_ report errors to the OS, > since all known OSes crash on this path. Oh, is this the __ghes_panic() path? If so, I'm going to turn away and plead ignorance unless the PCI core is doing something wrong that eventually results in that panic.