Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2427638imm; Thu, 9 Aug 2018 12:44:25 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyv/Yh+qOwGMZKXZF/ZpAc62a+g9sJTj7z+vztMidNSg2GBgBPuRIAB8D37vYhd3nLW23jN X-Received: by 2002:a17:902:e18d:: with SMTP id cd13-v6mr3152358plb.305.1533843865031; Thu, 09 Aug 2018 12:44:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533843864; cv=none; d=google.com; s=arc-20160816; b=jP8K1kmM0UBKQYpkGziNuT5Lg0Gbw2BXM79Vs+MyeyzcPssFSOp/RmigfxpF8DQFJ3 m6eV/JdMlgnnLo4fdob/k5wvtdnQqXrZ2PJD5oO6poxs2BSe/7TJ1VS9ZdpTrm6nr42E KJRx3Auukl5VmYOLmSsjyFDGgme38/Q+5ROB1lAiSHY9wXJaH29gjob9ypF984KMKYKL qiML9FEpnD+CekajlUFbpElAyeliNBBgV0m92sEFti7W2KY9uYkEIWzMb0MPbM0fIMcX oXzEgKn0y12tZZmN+oYrOCfTM7CyrGIeI5MtHMqaZY9P6WxaO19uykdz+H3zSK7b/Om/ PFiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=8JBQKlq/kNHalGoYH0563aL/tADMqSytw+aX2LFeF/M=; b=wEmRhu5oFpDOBfuJuSk4AzfAl89AIyomtBg853TGHVd0PS3MONYBJMTBdXl9np1fgf BWEUm+RGccY5zTKWd3eqnNOphJNgIMoorc5/CUWQJd2/qymdSvk75oXH0WfR2ZCVmPcZ 2pAjZIUZrSG7yAQj96rCRvEkFPFvdscV2a7KIgB4ahdE5kHlo77uuPCKqYfSuBYVfa4D ghDtUI3JIpS5ksf5IfJx5MmFcjGTmxkYNJeKXKNWFn1GBBpCxhg4VlZNHxQvtg0rlWv7 n5k6dR1OWkiI7UN0wciTdiddQ6RriA3OyrSOICAEvOgIydXqhUlAZUi7HzlThxAwt0oa sd/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lSGKHuqX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x136-v6si8293201pfd.124.2018.08.09.12.44.07; Thu, 09 Aug 2018 12:44:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lSGKHuqX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727190AbeHIWIq (ORCPT + 99 others); Thu, 9 Aug 2018 18:08:46 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:33673 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727117AbeHIWIq (ORCPT ); Thu, 9 Aug 2018 18:08:46 -0400 Received: by mail-oi0-f68.google.com with SMTP id 8-v6so11949697oip.0; Thu, 09 Aug 2018 12:42:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=8JBQKlq/kNHalGoYH0563aL/tADMqSytw+aX2LFeF/M=; b=lSGKHuqXwkygx+07/OAwBq4C51M9eMWBhniVpUNMUBBo5NTfk3hNiHFkOQq7P5kqcG MIlGHYiABbowLRYizkC9NCRQYAPRGnKq3YK8xUyBUUizI0ayg+87knMAa973EB6GBXdh GQruA4uNLgkwXeXi/2kvPZNwNBVhv2xuVtpuAewUDA1lB9qDiGdSwmxYZAwpY1F/M3HG Hg00PZwkrz5+aqDkZPgWgc2cM+5bu6ao9jYeSnGOuP1FU8eGaH7DkbAj7CB5c/lZ6+Sc 2WDz/aAcNl/D5wedSRdo73d8djCxaWHutIC8rM/TtHBMaWqozl/B3sD5Dp6uaMYJHLGP x88A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=8JBQKlq/kNHalGoYH0563aL/tADMqSytw+aX2LFeF/M=; b=IgI+SITIithlB/7rtc9Cw+++gIidy+i1U2rOh80ubCaSTSpZHfBOe3+S3tuh4P5UAB DxyQsgMNlTiHejUqQkJnGNn5cgarY8JGCEdavNHc87QiUugDLJEZHYYsDy+4+Nr5YAua qlluTaSI0Ao2yKcq/iDs6YsJwPWoo2DWV9kMhvHojVLjTZapVVBiO7zN5Kxd6gmnYU3T ZcCNTMq1ysenTfqWf5Jl0ICHjB0IIogDg8jCT5PB/imOvP1Dqnx/rVyXAM8wv+7ZttxA 2kvDa0N8CABt9dlb5sxZekNYN7NouIbDsvUy/0GenBcOvcWG43kx5iPjj6ITKY0C4oBf niSA== X-Gm-Message-State: AOUpUlHTchtGUfQq4IHKyUEcTV4JwC8l9PopzwDikzMaYfjA7YOtwQGR pb6fshq6uoOstvo8/P0oPQ59xchdpNU= X-Received: by 2002:aca:3110:: with SMTP id x16-v6mr3381911oix.126.1533843747409; Thu, 09 Aug 2018 12:42:27 -0700 (PDT) Received: from nuclearis2-1.gtech (c-98-195-139-126.hsd1.tx.comcast.net. [98.195.139.126]) by smtp.gmail.com with ESMTPSA id o206-v6sm8697172oif.7.2018.08.09.12.42.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Aug 2018 12:42:26 -0700 (PDT) Subject: Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER To: Bjorn Helgaas Cc: Alex_Gagniuc@Dellteam.com, bhelgaas@google.com, keith.busch@intel.com, Austin.Bolen@dell.com, Shyam.Iyer@dell.com, fred@fredlawl.com, poza@codeaurora.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180717153135.25925-1-mr.nuke.me@gmail.com> <20180809141551.GH49411@bhelgaas-glaptop.roam.corp.google.com> <2cae6a5ac8324be18b8dcf3d7dfcc288@ausx13mps321.AMER.DELL.COM> <20180809182905.GA113140@bhelgaas-glaptop.roam.corp.google.com> <20180809191832.GC113140@bhelgaas-glaptop.roam.corp.google.com> From: "Alex G." Message-ID: Date: Thu, 9 Aug 2018 14:42:25 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180809191832.GC113140@bhelgaas-glaptop.roam.corp.google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2018 02:18 PM, Bjorn Helgaas wrote: > On Thu, Aug 09, 2018 at 02:00:23PM -0500, Alex G. wrote: >> On 08/09/2018 01:29 PM, Bjorn Helgaas wrote: >>> On Thu, Aug 09, 2018 at 04:46:32PM +0000, Alex_Gagniuc@Dellteam.com wrote: >>>> On 08/09/2018 09:16 AM, Bjorn Helgaas wrote: >> (snip_ >>>>> enable_ecrc_checking() >>>>> disable_ecrc_checking() >>>> >>>> I don't immediately see how this would affect FFS, but the bits are part >>>> of the AER capability structure. According to the FFS model, those would >>>> be owned by FW, and we'd have to avoid touching them. >>> >>> Per ACPI v6.2, sec 18.3.2.4, the HEST may contain entries for Root >>> Ports that contain the FIRMWARE_FIRST flag as well as values the OS is >>> supposed to write to several AER capability registers. It looks like >>> we currently ignore everything except the FIRMWARE_FIRST and GLOBAL >>> flags (ACPI_HEST_FIRMWARE_FIRST and ACPI_HEST_GLOBAL in Linux). >>> >>> That seems like a pretty major screwup and more than I want to fix >>> right now. >> >> The logic is not very clear, but I think it goes like this: >> For GLOBAL and FFS, disable native AER everywhere. >> When !GLOBAL and FFS, then only disable native AER for the root port >> described by the HEST entry. > > I agree the code is convoluted, but that sounds right to me. > > What I meant is that we ignore the values the HEST entry tells us > we're supposed to write to Device Control and the AER Uncorrectable > Error Mask, Uncorrectable Error Severity, Correctable Error Mask, and > AER Capabilities and Control. Wait, what? _HPX has the same information. This is madness! Since root ports are not hot-swappable, the BIOS normally programs those registers. Even if linux doesn't apply said masks, the programming BIOS did should be sufficient to have *cough* correct *cough* behavior. >>>> For practical considerations this is not an issue today. The ACPI error >>>> handling code currently crashes when it encounters any fatal error, so >>>> we wouldn't hit this in the FFS case. >>> >>> I wasn't aware the firmware-first path was *that* broken. Are there >>> problem reports for this? Is this a regression? >> >> It's been like this since, I believe, 3.10, and probably much earlier. All >> reports that I have seen of linux crashing on surprise hot-plug have been >> caused by the panic() call in the apei code. Dell BIOSes do an extreme >> amount of work to determine when it's safe to _not_ report errors to the OS, >> since all known OSes crash on this path. > > Oh, is this the __ghes_panic() path? If so, I'm going to turn away > and plead ignorance unless the PCI core is doing something wrong that > eventually results in that panic. I agree, and I'll quote you on that! Alex