Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932535AbdDQDQ6 (ORCPT ); Sun, 16 Apr 2017 23:16:58 -0400 Received: from szxga03-in.huawei.com ([45.249.212.189]:5440 "EHLO dggrg03-dlp.huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932381AbdDQDQ4 (ORCPT ); Sun, 16 Apr 2017 23:16:56 -0400 Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event To: "Baicar, Tyler" , , , , , , , , , References: <1490869877-118713-1-git-send-email-xiexiuqi@huawei.com> <1490869877-118713-11-git-send-email-xiexiuqi@huawei.com> <32ca4e7e-eb5e-a4ff-33d6-68d06e9242fb@codeaurora.org> <6c0d2652-71ba-aefc-d6cd-5cc9a0b0d729@huawei.com> CC: , , , , , , , , From: Xie XiuQi Message-ID: <8aa30f6a-d18d-1cce-57dc-08efb52d822e@huawei.com> Date: Mon, 17 Apr 2017 11:16:19 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <6c0d2652-71ba-aefc-d6cd-5cc9a0b0d729@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.210] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.58F43394.00AD,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 2407da0efbadd54b630bd83ac4b14bbf Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5500 Lines: 141 Hi Tyler, On 2017/4/17 11:08, Xie XiuQi wrote: > Hi Tyler, > > Thanks for your comments and testing. > > On 2017/4/15 4:36, Baicar, Tyler wrote: >> On 3/30/2017 4:31 AM, Xie XiuQi wrote: >>> Add a new trace event for ARM processor error information, so that >>> the user will know what error occurred. With this information the >>> user may take appropriate action. >>> >>> These trace events are consistent with the ARM processor error >>> information table which defined in UEFI 2.6 spec section N.2.4.4.1. >>> >>> --- >>> v2: add trace enabled condition as Steven's suggestion. >>> fix a typo. >>> --- >>> >>> Cc: Steven Rostedt >>> Cc: Tyler Baicar >>> Signed-off-by: Xie XiuQi >>> --- >> ... >>> +#define ARM_PROC_ERR_TYPE \ >>> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \ >>> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \ >>> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \ >>> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" ) >>> + >>> +#define ARM_PROC_ERR_FLAGS \ >>> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \ >>> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \ >>> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \ >>> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" ) >>> + >> Hello Xie XiuQi, >> >> This isn't compiling for me because of these definitions. Here you are using ARM_*, but below in the TP_printk you are using ARCH_*. The compiler complains the ARCH_* ones are undefined: >> >> ./include/trace/../../include/ras/ras_event.h:278:37: error: 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function) >> __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE), >> ./include/trace/../../include/ras/ras_event.h:280:38: error: 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function) >> __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS), > > Sorry, it's a typo. It should be ARM_xxx. > >> >>> +/* >>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace >>> + * via TRACE_DEFINE_ENUM(). >>> + */ >>> +#undef EM >>> +#undef EMe >>> +#define EM(a, b) TRACE_DEFINE_ENUM(a); >>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a); >>> + >>> +ARM_PROC_ERR_TYPE >>> +ARM_PROC_ERR_FLAGS >> Are the above two lines supposed to be here? >>> + >>> +/* >>> + * Now redefine the EM() and EMe() macros to map the enums to the strings >>> + * that will be printed in the output. >>> + */ >>> +#undef EM >>> +#undef EMe >>> +#define EM(a, b) { a, b }, >>> +#define EMe(a, b) { a, b } >>> + >>> +TRACE_EVENT(arm_proc_err, >> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors. >>> + >>> + TP_PROTO(const struct cper_arm_err_info *err), >>> + >>> + TP_ARGS(err), >>> + >>> + TP_STRUCT__entry( >>> + __field(u8, type) >>> + __field(u16, multiple_error) >>> + __field(u8, flags) >>> + __field(u64, error_info) >>> + __field(u64, virt_fault_addr) >>> + __field(u64, physical_fault_addr) >> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid. > > Could we use the default value to check the validation which we have checked in TP_fast_assign? > >>> + ), >>> + >>> + TP_fast_assign( >>> + __entry->type = err->type; >>> + >>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR) >>> + __entry->multiple_error = err->multiple_error; >>> + else >>> + __entry->multiple_error = ~0; >>> + >>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS) >>> + __entry->flags = err->flags; >>> + else >>> + __entry->flags = ~0; >>> + >>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) >>> + __entry->error_info = err->error_info; >>> + else >>> + __entry->error_info = 0ULL; >>> + >>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR) >>> + __entry->virt_fault_addr = err->virt_fault_addr; >>> + else >>> + __entry->virt_fault_addr = 0ULL; >>> + >>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR) >>> + __entry->physical_fault_addr = err->physical_fault_addr; >>> + else >>> + __entry->physical_fault_addr = 0ULL; >>> + ), >>> + >>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;" >> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event. >> >> -0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0 >> -0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000 > As this section is ARM Processor Error Section, how about use arm_proc_err_event? > I agree. It looks much better. > >> >> Thanks, >> Tyler >> > -- Thanks, Xie XiuQi