Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp956865imm; Fri, 11 May 2018 08:54:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqAyZjxnc7JyrskCa5KZlJNtyzfP7aWH6YSeU/QrFySHBqG7ebZadz4jwHZF55aaH+CwcDs X-Received: by 2002:aa7:819a:: with SMTP id g26-v6mr6078328pfi.210.1526054077882; Fri, 11 May 2018 08:54:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526054077; cv=none; d=google.com; s=arc-20160816; b=NDNTQ4Dic18eHT5Xcf+6ssFEiD5jjsKkz3cMOthgLWaIbi+Dt6cQ+fjcUeVo9ZNtwx Bqf4VqTeoBqKAoJrvl8PqypjpjBiE8wFUlnTfw5J7sOsi9+qPTQirxufGiLjXwtAFJRy ZBVORnZioN15nDZqcaxIMGNc6Cz9x1KzAMjKgSbbP5P7obRpVBuxbwYmQETuPJ3pndzf G43o4dv3+mUo7ELulZbxDOMM9PHKcrdySOEh4eYfo9yfbjOjT9LLiWNyjA85D9wN87tk LKndHngh7FXq+AKWXdE7iDsoFlOslRV3q7n+UU/zbQJZEc0Hh+HWTEyVYOH2epwcr5SX TeTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=U6eofeepmJOHd6fnB0w8pH6pz9W9QcxP2YvFViQ+5JU=; b=mUoubuSnFthdg5gd5fUkdQw2yPAnIXSvYBM47xhERfpwE7CPbYIXjeR0wA3HorefGF KDE1aLrjpLuKmaT4N7hLgMO5g+7HvegkjiftRODFEIBPZZmgtvcs2ekBEH1TiUa7V9cX LFTamaDzjcN4DYkvqEg7NAH7SOlC0gfpniZoSEqab+Rk8HWBkom2/fIHOIrVSA6zBb8V dFCweFlKpjQkd7adNHA2kEBTtW0mK/lGISApjmj8G4LNc/vUVeTW4Qaoh3RIlQR0IPs5 nEj+gipJN0oHDoqbdsxYXN9An29uaEVhiB0+xhl5dy6HvIt5EXxdhS4cv49zQiEOhmjC a7Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=p6esnggF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z86-v6si3547044pfa.120.2018.05.11.08.54.22; Fri, 11 May 2018 08:54:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=p6esnggF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751273AbeEKPyO (ORCPT + 99 others); Fri, 11 May 2018 11:54:14 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:37295 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750746AbeEKPyM (ORCPT ); Fri, 11 May 2018 11:54:12 -0400 Received: by mail-oi0-f65.google.com with SMTP id w123-v6so5113430oia.4; Fri, 11 May 2018 08:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=U6eofeepmJOHd6fnB0w8pH6pz9W9QcxP2YvFViQ+5JU=; b=p6esnggFqjtT9+ibSVRX+SXUp5kin4zX4h1uKE2hhNXzNPg9cHapPiSdoKZWm479oG AnlUVThYF93iF/vjKxCCJ7md8i9U5hD2V+pFsCrRL27xdeZxunD7m30r5G5VwLEmUhJE d+kh2g4xjMh4xWleeQfdMG0ps+mR98BclzeeAJgpffVPByphjlVCr8kr+N1PFAhXFyJK wIVqlhjTY6b/5PtdeIZZzwYOuVRCwfhqmjDTOaaCutHxgxqQRYbaDsMtHNLLh9kJ0rT8 g6/CPUtEScowG7DbgXc42f0y+T2CCPpaMMkvKlNdRUQPmzPdzsD2YSmCQWgHS1dQM3xW 2r7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=U6eofeepmJOHd6fnB0w8pH6pz9W9QcxP2YvFViQ+5JU=; b=ugUk1c7J1DUdlJ+w//IXphu8144T7sgt8NncZ7Jatf6vYIB096VEfP+zDh99i3U8xQ do1NqYnsTiinNn9koWukxgK5p6jOQqFRYWtS1wuZTLHBZ/K2MAkLDGS8buUqC83BvfuE K1TInUZnzI/78/kIEzjsLnacmttfH/6/DFkkFhlw/xQ6wYJfLMqZpEgYcqBYV4pbFqRU 9vXeff+YHCjSf0QFF+aj4b2hx48FA8TarZV7PpGG74AcTYc+iykkYnppEIwpJ/4F7pX3 RY2xmOy27OTH4d8urcjROUUMsPS60SEMTMfLIsKn3HBJXaujgqcOAzWejMBW5bbezwmd RpIw== X-Gm-Message-State: ALKqPwd/wpJByHUZNcjjEr3AI2cadBOxhdjJ4XVvSPdZyM5f/K009j0M J4vhClVX3nj4ZCGdDgqQ0vM= X-Received: by 2002:aca:bf03:: with SMTP id p3-v6mr3278260oif.331.1526054051370; Fri, 11 May 2018 08:54:11 -0700 (PDT) Received: from nuclearis2_1.gtech (c-98-201-114-184.hsd1.tx.comcast.net. [98.201.114.184]) by smtp.gmail.com with ESMTPSA id f97-v6sm1930347otb.22.2018.05.11.08.54.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 May 2018 08:54:11 -0700 (PDT) Subject: Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES To: Borislav Petkov Cc: alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, "Rafael J. Wysocki" , Len Brown , Tony Luck , Mauro Carvalho Chehab , Robert Moore , Erik Schmauss , Tyler Baicar , Will Deacon , James Morse , Shiju Jose , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, devel@acpica.org References: <20180430212836.7807-1-mr.nuke.me@gmail.com> <20180430213358.8319-1-mr.nuke.me@gmail.com> <20180430213358.8319-3-mr.nuke.me@gmail.com> <20180511154039.GD12705@pd.tnic> From: "Alex G." Message-ID: <8e3c0cc6-9c5c-85ce-650c-8f498f5907da@gmail.com> Date: Fri, 11 May 2018 10:54:09 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180511154039.GD12705@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/11/2018 10:40 AM, Borislav Petkov wrote: > On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote: >> The policy was to panic() when GHES said that an error is "Fatal". >> This logic is wrong for several reasons, as it doesn't take into >> account what caused the error. >> >> PCIe fatal errors indicate that the link to a device is either >> unstable or unusable. They don't indicate that the machine is on fire, >> and they are not severe enough that we need to panic(). Instead of >> relying on crackmonkey firmware, evaluate the error severity based on > ^^^^^^^^^^^^ > > Please keep the smartass formulations for the ML only and do not let > them leak into commit messages. You're right. The monkeys are not crack. Instead, what a lot of manufacturers do is maintain large monkey farms with electronic typewriters. Given a sufficiently large farm, they take those results which compile. Of those results, they pick and ship the one that takes longest to boot, without the customers complaining. That being clarified, should I replace "crackmonkey" with "broken" in the commit message? (snip) >> +/* PCIe errors should not cause a panic. */ >> +static int ghes_sec_pcie_severity(struct acpi_hest_generic_data *gdata) >> +{ >> + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); >> + >> + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && >> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO && >> + IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER)) > > How is PCIe error severity dependent on whether the AER error reporting > driver is enabled (and possibly not even loaded) on the system? Borislav, I sense some confusion. AER is not a "reporting" driver. It handles the errors. You can't leave these errors unhandled. They propagate to the root complex and can cause fatal MCEs when not handled. The window to handle the error is pretty large, so it's not a concern when you're handling it. Alex >> + return CPER_SEV_RECOVERABLE; >> + >> + return ghes_cper_severity(gdata->error_severity); >> +} >> +/* >