Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1727833imm; Tue, 22 May 2018 08:25:04 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqsEtzEKrmJgax6mQOcaEAAHumNwY1MThUvLw5WCI/KgvqT/b5hGBu3A9WnGv2ZsGq7Sncg X-Received: by 2002:a62:6756:: with SMTP id b83-v6mr24791049pfc.76.1527002704098; Tue, 22 May 2018 08:25:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527002704; cv=none; d=google.com; s=arc-20160816; b=rGy1w0JXxqubNKhXWQG0peHGKH5CVfyzq/EPPO7zmycC15HdxKP1w2A1r9ddKKVT22 NsNpsAcV0Svt3TMBSj8Gq4ZD5PqbMtwQBBvDULINqefuCSUnCTx67j+NOsqu8Wml4RRI nHKYCR0CXGnLyaKpUxH15HGxfItJ6ckri0FOKbo9QF70C7G/lLfIp6XUIB+Hx9o0G82E yF0UrlMBVFRlmLjwbpdFqnSlE2E1UEi0BsWTEx/SmOUsZoUzBYSaaJzPi55hc2BP/h/U MLzXCmxCCL1BkcKED6Ln8ZMYlpXYlCA42BLGTH/FjnZQl8q6vMV2jx5VujT0MsLnLn8i vU8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=wnfYSlx6czfKdnWmz3oZRqiLeaK/MVVQhs6K+e2JamA=; b=MVysRTQB9UdwrcZBajd36LYd3vxpNQ6n02bT7cczlJFSLJFHvTGGGXVw0ytN7FOJcF jil4/b0qfdIdYR9EcmT889NULgXAIcff6qXZks1utlYS9Ge6sGeDIIJ9VMXWmdLkOIMN xO37CigGAoOdcST1PdKZcUDeXhYO2ls1fBQTzX3RQQ+qbbdlH+ua23E5DnOcuTHvJ6Lt FROaEjy3zE6Vg7Y96GY+lWWv62kI8HELQ2jxlOkPxfNoLD9vAywD2tZH1hFskalgOeFZ lCDko1Lml2XXbklFZiijM2ehGXbpEEuVdovcekDUcg+Mvf8RPV176zWTRkHmx7nDrOUZ YRrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=o1lhvBBt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y199-v6si16971222pfb.284.2018.05.22.08.24.49; Tue, 22 May 2018 08:25:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=o1lhvBBt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751586AbeEVPW0 (ORCPT + 99 others); Tue, 22 May 2018 11:22:26 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:45146 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751199AbeEVPWW (ORCPT ); Tue, 22 May 2018 11:22:22 -0400 Received: by mail-oi0-f68.google.com with SMTP id b130-v6so16554004oif.12; Tue, 22 May 2018 08:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=wnfYSlx6czfKdnWmz3oZRqiLeaK/MVVQhs6K+e2JamA=; b=o1lhvBBtCMWJvg4Po/6jFVA3NJYDL+RZJA/IPcqNdw/Nb4+fpvvCLVS8Nj3pbHT+9k SJwsqKYcL6WkQP6Q8jm/4Z4A3D++91pQ9zclyOOvoH9Lknhhd8jSDDnJzT2EWTbgRQiY i5OrppZpUWDIEiHERN4VoJm0FSZ6XsGiHDsUXQrP24oUkhFvgVapNq/MmC+ererODhH7 iPmHUKfIzPJdIUbgHJ46hGZqvYOgCSpozF5utqwlS/bV0uBD7cEsxm6LwDwnsbe0SXm3 kmjrVLl+2k9udCNs3U9PIl0aq7a73uFXf293b+hfnHMq7e/HgRIgYo/4qHVwgoUhBHyR JI1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=wnfYSlx6czfKdnWmz3oZRqiLeaK/MVVQhs6K+e2JamA=; b=I6q1WjMMC5hGk1Ip6cUjgqQGBZ75StwvZ+mJEBSIL3PMRuWrB+gXR2uUVYQSO4iSdI qwlpudShfo2yZ0hz0qJtTBv6Sx8+9ERoWzDTXhgP96qVFz7QPaWsabb3E5svqJa4H49R H3PLOUbIxfKDgAysvPIYz5pVuq3izlMex1Eg5YeR6aSSwTsMkMC0Wk4TCMz2SEsbhRCJ +n2d0YOGk96qxWsOb3RwiGjzqvtJ0dd+y1YlxWq5ida7tTWiVmZstf13dgxrvFvfmgXk In9SWnTKd20QUtIwn+DIAyJiRn8RGp3JvM3RYVEDDsw7O3HFkv6G9RRpS+WlrMSs/g4F rWdw== X-Gm-Message-State: ALKqPwdjF8s8nS8LvUe7tXjJxZdJUulN30G155ckJXX3x7+vyp+0YLu2 GILHYH3UaTuPs3MXk4G5Roz1ydNdGnw= X-Received: by 2002:aca:6257:: with SMTP id w84-v6mr15046802oib.26.1527002541162; Tue, 22 May 2018 08:22:21 -0700 (PDT) Received: from nuclearis2_1.gtech (c-98-201-114-184.hsd1.tx.comcast.net. [98.201.114.184]) by smtp.gmail.com with ESMTPSA id f142-v6sm12916292oig.45.2018.05.22.08.22.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 May 2018 08:22:20 -0700 (PDT) Subject: Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity() To: Borislav Petkov Cc: "Rafael J. Wysocki" , alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, "Rafael J. Wysocki" , Len Brown , Tony Luck , Tyler Baicar , Will Deacon , James Morse , Shiju Jose , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , ACPI Devel Maling List , Linux Kernel Mailing List References: <20180521135003.32459-1-mr.nuke.me@gmail.com> <20180521135003.32459-2-mr.nuke.me@gmail.com> <53d0ba88-6929-a7cf-6c3e-4ca389f7249a@gmail.com> <20180522135015.GF5512@pd.tnic> <0b758a1c-90e3-6f76-4f83-1e22c8fc9cd6@gmail.com> <20180522145426.GG5512@pd.tnic> From: "Alex G." Message-ID: <9b3823fc-a660-a619-68b9-43b879f81b05@gmail.com> Date: Tue, 22 May 2018 10:22:19 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180522145426.GG5512@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/22/2018 09:54 AM, Borislav Petkov wrote: > On Tue, May 22, 2018 at 09:39:15AM -0500, Alex G. wrote: >> No, the problem is with the current approach, not with mine. The problem >> is trying to handle the error outside of the existing handler. That's a >> no-no, IMO. > > Let me save you some time: until you come up with a proper solution for > *all* PCIe errors so that the kernel can correctly decide what to do for > each error based on its actual severity, consider this NAKed. I do have a proper solution for _all_ PCIe errors. In fact, we discussed several valid approaches already. > I don't care about outside or inside of the handler I do. I have a handler that can handle (no pun intended) errors. I want to use the same code path in native and GHES cases. If I allow ghes.c to take different decisions than what aer_do_recovery() would, I've failed. >- this thing needs to be done properly Exactly! > and not just to serve your particular use case of > abrupt removal of devices causing PCIe errors, and punish the rest. I think you're confused about what I'm actually trying to do. Or maybe you're confused about how PCIe errors work. That's understandable. PCIe uses the term "fatal" for errors that may make the link unusable, and which may require a link reset, and in most other specs "fatal" means "on fire". I understand your confusion, and I hope I cleared it up. You're trying to make the case that surprise removal is my only concern and use case, because that's the example that I gave. It makes your argument stronger, but it's wrong. You don't know our test setup, and all the things I'm testing for, and whenever I try to tell you, you fall back to the 'surprise removal' example. I don't know why you'd think Dell would pay me to work on this if I were to allow things like silent data corruption to creep in. This isn't a traditional company from Redmond, Washington. > I especially don't want to have the case where a PCIe error is *really* > fatal and then we noodle in some handlers debating about the severity > because it got marked as recoverable intermittently and end up causing > data corruption on the storage device. Here's a real no-no for ya. I especially don't want a kernel maintainer who hasn't even read the recovery handler (let alone the spec around which the handler was written) tell me how the recovery handler works and what it's supposed to do (see, I can be an ass). PCIe errors really are fatal. They might need to unload the driver and remove the device. But somebody set the questionable policy that "fatal"=="panic", and that is completely inappropriate for a larger class of errors -- PCIe happens to be the easiest example to pick on. And even you realize that the argument that a panic() will somehow prevent data corruption is complete noodle sauce. Alex