Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp126047imm; Thu, 30 Aug 2018 09:52:59 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaIhK017nMpoEAlgm+pXt0xJojp2T9uPkYRLm9/8QIC3lFtkzd7iAjxyPNfUfwTIeIttwgf X-Received: by 2002:a62:d113:: with SMTP id z19-v6mr11229297pfg.98.1535647979051; Thu, 30 Aug 2018 09:52:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535647979; cv=none; d=google.com; s=arc-20160816; b=ksJTH4SDphoghdb9H5I+zsXsV2tc+7/+rEkbQI5c56t1eG2irCWi6GNdsgjrpXT6/K IASR3D5E8QRe8Bbpn3R0hYtXeVkbyiPXWVp2B37E0OqJKhwiAlamI402ckhZkbJDG67k 4+E+9OnRXhyONY/oAxdm29YUvLbpR++IU8tKnTxYMmEoIR9E5wn8U+uFF7bQIczZjvFj o0SV00NY2Et4MsQ2yrmwiUo1AOCPbkyYYAGX/3w5CsaVSqlDoWDkBaLsdRiByFDoXafV vzXdgzDIhYydh5eRUHAmu/T9m5FhFECJClsN850Joxd2KeZ7+cj6aOIq6EWrka7h12wJ Idiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject:arc-authentication-results; bh=sXK9twHNnZG6qP4FnHzKrCPtI4/Lni1E29GUfCfwoKY=; b=mL9kIJv6q4M7tVwP2E4eVE+QQhmmcNvD3J8dTCIubZQ3gaSe6oMFqcjcsboqPyUoa1 lEx3Hv5ZgdeqU+vJ5l9dBvC5/B73EJcGtpHquQFxT7vK2Yo842pYniciLKBVaWUzl3IA ml7mfttUmNGfWYdr7e59MHM1vHHur9JXywJ9Md3ObiXNvm93MGqd5iOpMltsADKP/HER zzXRQvaffYvkwEWPB2sAdesVOuNfJs+uMen8bVaxuDw1JfMErUishRYO9DjchjV4D8eq agccsPqQedPQf0LMSmI9WUUmEkmVv7qH9lJtg3s7MqBcgxkgGvu+N/dYdfy9DukeFEDG nr6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l5-v6si7111862pgb.686.2018.08.30.09.52.44; Thu, 30 Aug 2018 09:52:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727713AbeH3Uyj (ORCPT + 99 others); Thu, 30 Aug 2018 16:54:39 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:11614 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726592AbeH3Uyj (ORCPT ); Thu, 30 Aug 2018 16:54:39 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 8D5899BF908BA; Fri, 31 Aug 2018 00:51:06 +0800 (CST) Received: from [127.0.0.1] (10.202.226.41) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.399.0; Fri, 31 Aug 2018 00:51:00 +0800 Subject: Re: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs To: James Morse References: <1535567632-18089-1-git-send-email-wufan@codeaurora.org> <5eab89c6-c063-cbc2-4d02-459faf87698a@arm.com> CC: Zhengqiang , Fan Wu , , , , , , , Linuxarm , Xiaofei Tan , wanghuiqiang , Shiju Jose From: John Garry Message-ID: <6cc3c5e2-3827-a89c-e37b-09728a34f21f@huawei.com> Date: Thu, 30 Aug 2018 17:50:53 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <5eab89c6-c063-cbc2-4d02-459faf87698a@arm.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.226.41] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/08/2018 17:34, James Morse wrote: Hi James, Zhengqiang no longer works on this topic, so I have cc'ed some more guys who should be able to help. John > Hi Zhengqiang, > > On 29/08/18 19:33, Fan Wu wrote: >> The current ghes_edac driver does not update per-dimm error >> counters when reporting memory errors, because there is no >> platform-independent way to find DIMMs based on the error >> information provided by firmware. This patch offers a solution >> for platforms whose firmwares provide valid module handles >> (SMBIOS type 17) in error records. In this case ghes_edac will >> use the module handles to locate DIMMs and thus makes per-dimm >> error reporting possible. > > Does your platform set CPER_MEM_VALID_MODULE_HANDLE in GHES Memory errors? If > so, any chance you could test this patch on your platform? [0] > (original patch: https://lore.kernel.org/patchwork/patch/978928/) > > Thanks, > > James > > [0] https://marc.info/?l=linux-edac&m=152603960002324 > > >> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c >> index 473aeec..db527f0 100644 >> --- a/drivers/edac/ghes_edac.c >> +++ b/drivers/edac/ghes_edac.c >> @@ -81,6 +81,26 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg) >> (*num_dimm)++; >> } >> >> +static int ghes_edac_dimm_index(u16 handle) >> +{ >> + struct mem_ctl_info *mci; >> + int i; >> + >> + if (!ghes_pvt) >> + return -1; >> + >> + mci = ghes_pvt->mci; >> + >> + if (!mci) >> + return -1; >> + >> + for (i = 0; i < mci->tot_dimms; i++) { >> + if (mci->dimms[i]->smbios_handle == handle) >> + return i; >> + } >> + return -1; >> +} >> + >> static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) >> { >> struct ghes_edac_dimm_fill *dimm_fill = arg; >> @@ -177,6 +197,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) >> entry->total_width, entry->data_width); >> } >> >> + dimm->smbios_handle = entry->handle; >> + >> dimm_fill->count++; >> } >> } >> @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) >> p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos); >> if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) { >> const char *bank = NULL, *device = NULL; >> + int index = -1; >> + >> dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device); >> + p += sprintf(p, "DIMM DMI handle: 0x%.4x ", >> + mem_err->mem_dev_handle); >> if (bank != NULL && device != NULL) >> p += sprintf(p, "DIMM location:%s %s ", bank, device); >> - else >> - p += sprintf(p, "DIMM DMI handle: 0x%.4x ", >> - mem_err->mem_dev_handle); >> + >> + index = ghes_edac_dimm_index(mem_err->mem_dev_handle); >> + if (index >= 0) { >> + e->top_layer = index; >> + e->enable_per_layer_report = true; >> + } >> + >> } >> if (p > e->location) >> *(p - 1) = '\0'; >> diff --git a/include/linux/edac.h b/include/linux/edac.h >> index bffb978..a45ce1f 100644 >> --- a/include/linux/edac.h >> +++ b/include/linux/edac.h >> @@ -451,6 +451,8 @@ struct dimm_info { >> u32 nr_pages; /* number of pages on this dimm */ >> >> unsigned csrow, cschannel; /* Points to the old API data */ >> + >> + u16 smbios_handle; /* Handle for SMBIOS type 17 */ >> }; >> >> /** >> > > > . >