Received: by 10.223.185.116 with SMTP id b49csp5480064wrg; Tue, 27 Feb 2018 14:12:26 -0800 (PST) X-Google-Smtp-Source: AH8x227Eq0rqfQ7AxWuQ8xVsAzTS9HaG6ERJY7W4seMkx5PCxylNtrAKhCmHJAY2UaSVpnx6MlqC X-Received: by 2002:a17:902:522:: with SMTP id 31-v6mr15695317plf.122.1519769545987; Tue, 27 Feb 2018 14:12:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519769545; cv=none; d=google.com; s=arc-20160816; b=AlOfYj+KzhWbJlCbOqEIlvGGQgpQNVLp8om9b8EFVnvNqBq/Av3m5oEv9cgN2A3oYo mEHkSX5Rk4EpfKeDQdRyrUeCATJXa004xhSSOnLtu6i12drX3JOZtBb085t1GELMs2S6 1k2jmAeMmZimOOQtNXvX19yMOMpU76EoOEQuAFHDe4P0CZhij7iDmkqhLky5ubxZzuxZ OzTxSCl3GjaTtrapEcR6CNnVbRkCM4ZQv7OuqBV6WLE60jzW1Gr5qE9feWm4Ll3N7y64 +PUknzv46e1ECesA0DSdgKzmWm84Oe6gxV3uad93Z+zfjK3xyX+goS3jN+LG3qIKSSWd MLGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=w8+//8swkFuDd7SZIAsK3fkAuBq5qL/HuLe/Op4ssYw=; b=jV56YFXK171+jlgfz/XOINFnI6ayZq0Zbk1f9d/luLVc7i97YsPpZPymqG6FdCXK6i CaXcJCkpvq1V8WeYTfCE1wJMfHyj9fWqV77GIbgdDRllEH3cBW4dRQfrZlzYDnCC7JnQ RFNSYCseLCNohtY3SW0sF+04mVAZLczbqVAX7x5feAZu9Xw2+VlmIG8MheSePbzpDyFu IyXwNgrd9hsJiDWi77N/86gkKMalBGMM75SWKk+1WJFOFJVOizDy6qTKfwYTek9FpUqU vgv9jaos/48031fo1i3jZinYGMlA/dV1yXoH9lqavGl72G15Khdt69lnNd7/fgLt20eG JWlg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n7si97978pga.199.2018.02.27.14.12.10; Tue, 27 Feb 2018 14:12:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751959AbeB0WLG (ORCPT + 99 others); Tue, 27 Feb 2018 17:11:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:38449 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751831AbeB0WLF (ORCPT ); Tue, 27 Feb 2018 17:11:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4159AAEF6; Tue, 27 Feb 2018 22:11:03 +0000 (UTC) Date: Tue, 27 Feb 2018 23:10:39 +0100 From: Borislav Petkov To: "Ghannam, Yazen" Cc: "linux-efi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "ard.biesheuvel@linaro.org" , "x86@kernel.org" , Tony Luck Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure Message-ID: <20180227221039.GR26382@pd.tnic> References: <20180226193904.20532-1-Yazen.Ghannam@amd.com> <20180226193904.20532-4-Yazen.Ghannam@amd.com> <20180227142531.GF26382@pd.tnic> <20180227170423.GK26382@pd.tnic> <20180227180231.GO26382@pd.tnic> <20180227190943.GQ26382@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 27, 2018 at 09:32:18PM +0000, Ghannam, Yazen wrote: > Not much more readable. It's still vague and confusing to a user and devoid > of any real info so an expert can't help. And now the information is printed > arbitrarily, so someone needs to read the source to figure out what it really > means. WTF? You need to read the source to figure out what the error is? So "Corrected Processor Error" is confusing? I think you've been clearly staring at the spec for waaay too long. Also, read my mail again: "Now, I admit that my vesion of the record is not enough to debug it but it needs to contain only information which is clear and humanly readable to debug. You can always dump the raw data underneath from the tracepoint but make the beginning human readable." > Maybe. But these records are generated by Platform Firmware. Why would > FW report the error knowing the system is about to die? WTF more! Dude, are you kidding me? So the firmware should not report the error if it knows the system is about to die?!?! Now you're just making up insane counterarguments, just because. > Your example still says "Hardware Error" Oh my, that's the *prefix*. > and odds are general users won't understand what the error type means > or what a corrected error is. So it's not much better. Yeah, and the next thing you'll say is that users won't understand what "error" means, right? Geez. > Exactly! The more info available (usually) the more quickly it can be > diagnosed. You still don't understand what I'm trying to explain to you. It is *not* about diagnosing it - in order to do that you need to involve people to diagnose it. It is about making the error record as human readable as possible so that you don't *have* to involve people to diagnose it in the first place and the user can say, ah ok, corrected error, no need to do anything. Or "System Fatal error" - I better replace that part. > Hardware errors are generally rare and hard to reproduce. Except when they're not like DRAM ECC floods which are pretty easy to reproduce. > So when one does occur we should capture the data and provide it. Did I say you should not do that?! I said: make it as human readable as possible and dump the gory hex crap after it. > Here are a couple of scenarios based on similar experiences I've had: Now play that same scenario with the following record format: [ human readable error record ] [ full raw error dump ] > I'll send a V3 set with the following changes: > 1) Fix table numbering in commit messages. > 2) Remove "Validation Bits" lines. > 3) Only print error type GUID for unknown types. > > I think this set should focus on printing the x86 CPER based on the UEFI > spec and the convention of the other CPER code. CPERs are generated > by Platform Firmware. So errors are explicitly intended to be viewed by > the user and all info should be displayed. You should look up from the spec and realize that real life is much different. > I *have* been thinking that it would be nice to take the CPER and pipe it > through the MCA decoding in arch/x86 and EDAC. This would be really > nice for when the CPER includes the MCA registers in the Context info. > So we'd get our usual MCA decoding instead of a binary blob of registers. That would definitely be a step in the right direction. > I was thinking that the MCA decoding would be in addition to this. But > based on Boris's comments, maybe we can make it a default selection. > For example, if MCA/EDAC decoding is available, use it. Otherwise, print > the CPER fields in a generic way like we do for the other CPER types. Yes. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --