Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1905237imm; Tue, 22 May 2018 11:12:51 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrOI64UvrV3S8om1+aDtuRMJqU0LCXeVUeTJMcW6B1fGTxqd5+U2aMVji6/RmIO8JjOmvlq X-Received: by 2002:a17:902:a616:: with SMTP id u22-v6mr26320840plq.186.1527012771007; Tue, 22 May 2018 11:12:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527012770; cv=none; d=google.com; s=arc-20160816; b=qQBoaD8mW6UajB8Tevd0dy4YP1WCB7hU35dzOHcwiVttOJ58KuzD9r+id4jUVaAjYh /XU9ZZFyzv4noXabfgwuHhK2z5chnlNxW5EVSayUFxhS4nNMMM5o6eRm/XFHrwSl2SN0 Yc/6uskzxgcpcAZQweG9T4GfSdhVaeKSC/64eqc+Tl7iJ4IAnZXMmuht6YTd+E0AHfxc AZuqGNvkCg4+dQmXWnuUGATOcnjgejTbmdk2clLpxiqf/T1jV0vw0+wuOZe1QbgXgF09 OVht3jLRiVWVQPR/5ckMUtlt+mg1URAMY1mXQzo/7kYovpoqaUG+z2oUnCNLOiqE1k2H NWYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=lv2l14rxn+BmVzALov1avr8nb+Q1iAb+U4RCZ7ivlfY=; b=MY3G1ZfCtuT0NAIyQDMFY0GQCwjWaAJ6/ThGM0+MhDXloXStOvlfDgHaxh7MXf77GV MaE64xJADbNf82iSsKoOZeRTVf8kRfMAd+KS57fSsbkaPEoShdfM7hqz5H6LpWFKwpBj AQvczBm9eYF1ezX3Ay3zsF4m04szJRev/sy3cnSNQtYR4oX4rFxkJMro8jHIiarBV0L7 WvpXgWdrIkmuFekwsoP67Sz+7U9FBeL+dtFdVbKi7jpwFjhgw5inLSLSTGQOYbBTDYyH bsIcfGTEdyv3JtiRzcUFBOow+cMIYrDEpXOla6nRIl+OPAd3ZDesBr3UzQYgJjL7QO+f gYNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=eiaxLYhs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n14-v6si13237237pgu.688.2018.05.22.11.12.35; Tue, 22 May 2018 11:12:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=eiaxLYhs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752096AbeEVSKt (ORCPT + 99 others); Tue, 22 May 2018 14:10:49 -0400 Received: from mail-ot0-f195.google.com ([74.125.82.195]:38269 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751902AbeEVSKs (ORCPT ); Tue, 22 May 2018 14:10:48 -0400 Received: by mail-ot0-f195.google.com with SMTP id n3-v6so22063770ota.5; Tue, 22 May 2018 11:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=lv2l14rxn+BmVzALov1avr8nb+Q1iAb+U4RCZ7ivlfY=; b=eiaxLYhsypqVmqACuCPdBLsbkWbliQxs4VNjY4N1eqSMVLkIJ07OfDl2XASByPuHoE 9hPFSyPkgaFGRt+sDFTlmJG+R/RyeCI6sHb37ZUmyLCXq8uSoTc2cVuMfT9iGc2Q/v48 Ii17YU97AOZJQ1yKmW8/XG/8YZc8oCt6wjkm/fvUoIanVFfqmjp/cVaF+S4nIODT2qP1 Io+bI8YtdkW7lZ01FkOxfvbOLJroUIdkpHMgX0+z90eBSku2d16pGKRp+VLqsCpKs4fY vEPIG2IFs0Vf5o7qp2FvZ1DsGRrlRoN4EyknhvRF9q7AjpRIm5HCfffhD26tskahzjEo y26g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=lv2l14rxn+BmVzALov1avr8nb+Q1iAb+U4RCZ7ivlfY=; b=HbDUK89TCc76QedFFNlPaq3nfg3CHVH/kCgk8sCDSSvXp1i/jvjEKNFSNOtjxXv3CS bdSBjr6JxgaL07TPQce9IkbqPdqfMStxz90hPc2yfs9tYDGo7i08kFdXTrirEiKT2ds+ hao2TkhfQpdurYr7ESJcgocAETJa9ATAvCKzINpI5sdzxGnvWTMV72/fRzeml/IjRZbB ZZ17fzlWJ2jDxpqABR7Y1bF1+4f17LFnvo4DTqt9vcF0+o1cN5sMgHfqRydrFtE5Xc77 o2HXSv+OwvGwVad4V+ib0VPZE4KKhciZBTggWu9rLxhrsid6XdqV1F8t7diIsqBVhWP+ yCeg== X-Gm-Message-State: ALKqPwe/uP2xk7Xd3w2vog3qlftkf3D68lQETk5vRCGBat+5eudwo46t fdSoPUSCfp1lASC5gwGNiQcEocLHnP6ACIFuLKQ= X-Received: by 2002:a9d:5917:: with SMTP id t23-v6mr17551954oth.217.1527012647667; Tue, 22 May 2018 11:10:47 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:1468:0:0:0:0:0 with HTTP; Tue, 22 May 2018 11:10:47 -0700 (PDT) In-Reply-To: <20180522175742.GA3543@agluck-desk> References: <20180521135003.32459-1-mr.nuke.me@gmail.com> <20180521135003.32459-2-mr.nuke.me@gmail.com> <53d0ba88-6929-a7cf-6c3e-4ca389f7249a@gmail.com> <20180522135015.GF5512@pd.tnic> <0b758a1c-90e3-6f76-4f83-1e22c8fc9cd6@gmail.com> <20180522145426.GG5512@pd.tnic> <20180522175742.GA3543@agluck-desk> From: "Rafael J. Wysocki" Date: Tue, 22 May 2018 20:10:47 +0200 X-Google-Sender-Auth: jOTbGn1e_7DgZndDln0LdtwB-a0 Message-ID: Subject: Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity() To: "Luck, Tony" Cc: Borislav Petkov , "Alex G." , "Rafael J. Wysocki" , alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, "Rafael J. Wysocki" , Len Brown , Tyler Baicar , Will Deacon , James Morse , Shiju Jose , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , ACPI Devel Maling List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 22, 2018 at 7:57 PM, Luck, Tony wrote: > On Tue, May 22, 2018 at 04:54:26PM +0200, Borislav Petkov wrote: >> I especially don't want to have the case where a PCIe error is *really* >> fatal and then we noodle in some handlers debating about the severity >> because it got marked as recoverable intermittently and end up causing >> data corruption on the storage device. Here's a real no-no for ya. > > All that we have is a message from the BIOS that this is a "fatal" > error. When did we start trusting the BIOS to give us accurate > information? Some time ago, actually. This is about changing the existing behavior which has been to treat "fatal" errors reported by the BIOS as good enough reasons for a panic for quite a while AFAICS. > PCIe fatal means that the link or the device is broken. And that may really mean that the component in question is on fire. We just don't know. > But that seems a poor reason to take down a large server that may have > dozens of devices (some of them set up specifically to handle > errors ... e.g. mirrored disks on separate controllers, or NIC > devices that have been "bonded" together). > > So, as long as the action for a "fatal" error is to mark a link > down and offline the device, that seems a pretty reasonable course > of action. > > The argument gets a lot more marginal if you simply reset the > link and re-enable the device to "fix" it. That might be enough, > but I don't think the OS has enough data to make the call. Again, that's about changing the existing behavior or the existing policy even. What exactly has changed to make us consider this now? Thanks, Rafael