Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp33490imm; Thu, 30 Aug 2018 07:42:42 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYoqiBiDr5AjBqxcOisSuGwm3MIOmlH4kGQk6vl2KXXX1ZHa64Kvio47/Ywai7v9m0HjgFw X-Received: by 2002:a63:4306:: with SMTP id q6-v6mr8661101pga.181.1535640162095; Thu, 30 Aug 2018 07:42:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535640162; cv=none; d=google.com; s=arc-20160816; b=kJH41xngBEl76NqpMrz7KiqSQej+z1a9048szL4/iE7JPK0/tNPbz/S8SHgub3hckO mV6VhrUOCpAjvqNxfwZeq/yx+PizjtGX5q/sG3rvCsobldy39564U2J0CnH2w7KMYyw3 JFqv0qzGMFzQ3TrSB88dHHifixnPZBte2ZdYfPWjAvDIQ+fi6FonrB1FDotM8p2dWNjV 9rocdLZOzTMDR6B8Z+qEtzlFYdmwEuYF7uiWkeL5UoGnKX5vfFGy6TBhz06QhReDx1Lv ar1TlkdE6m3pB+nY1GbNQK61F6MYJ4x1uAUTta/6Vhj46g1iVeadBniihgB2d/l1ANfv 82lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language:thread-index :content-transfer-encoding:mime-version:message-id:date:subject :in-reply-to:references:cc:to:from:dmarc-filter:dkim-signature :dkim-signature:arc-authentication-results; bh=W+YExw6ie+JXE5t87hBVgBoln81Y5E85JReGLbAGGyQ=; b=PNnbsypd+QPeK2MXbMtaT1XnHo4+5Auefvw+52io7Lwy75D/C0dKDsdpo+op+/BuZL S16GxoxQUZ4OM+nHO9VnPpEdhaOAVZ6voXfv2o+PyP4dyKJxXh1W5CF0YGKx3o1d/xTI J9zNx5Lh2PWEV0mQJL2xBb84Ynn1JoRwMtI+AGngOJFHpjyuw2HtklTy0rqG89cEU2NK +xHxkAdPAquwtV5QoCz+QASSvWH1z8AVH/ZjQHCqOe9d75bq/gt3mkdt+Ci62eSOCpic uouIAV1ffBXswS4UIVCkLU51jygqpvijg4d50eRvFSbYbfBvBRn6G7BzmA9ytoHjyp5p rWNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=TK9woC2w; dkim=pass header.i=@codeaurora.org header.s=default header.b=LzM1rES+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n70-v6si6888641pfa.320.2018.08.30.07.42.20; Thu, 30 Aug 2018 07:42:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=TK9woC2w; dkim=pass header.i=@codeaurora.org header.s=default header.b=LzM1rES+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729405AbeH3SnL (ORCPT + 99 others); Thu, 30 Aug 2018 14:43:11 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:51080 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728972AbeH3SnL (ORCPT ); Thu, 30 Aug 2018 14:43:11 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 6A2D960388; Thu, 30 Aug 2018 14:40:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1535640042; bh=gu5qe3RnPUWCsVYyYOgUi09FFa136wIbvQedHfU1rMY=; h=From:To:Cc:References:In-Reply-To:Subject:Date:From; b=TK9woC2w0gn9BNqua/XaNjzW508FPTOb2nflJDAjIGHTYc6m3iJv1TRLQW7oUukPv 88gSPLQj8KZqNH6wc3htaw8asPo+i5kVz8RvGo97TQ2sf/s15cB/qmAK7JyswdYN2I j3ycWN0/gkeVTKxlwjCgd1EuHcaflcsriXxxM87Y= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from WUFANW10 (c-71-205-14-210.hsd1.co.comcast.net [71.205.14.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: wufan@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id EF024606DB; Thu, 30 Aug 2018 14:40:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1535640041; bh=gu5qe3RnPUWCsVYyYOgUi09FFa136wIbvQedHfU1rMY=; h=From:To:Cc:References:In-Reply-To:Subject:Date:From; b=LzM1rES+j1qDGCcZHAotIseDRNPZ4jBC6F2Lyvti42xEWvF6E4eKIw1kHMoWcEmIM Go9RsoizKlL7mub9gA/ngn/gVNSyNmW8SiNKIHXKSsEBLg2Cwn5d5V17PpUN7r3Bte huqla3QX8VMGUXa8VelkDYdnTt4VJpHdHtF41qgI= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org EF024606DB Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=wufan@codeaurora.org From: "wufan" To: "'James Morse'" Cc: , , , , , References: <1535567632-18089-1-git-send-email-wufan@codeaurora.org> In-Reply-To: Subject: RE: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs Date: Thu, 30 Aug 2018 08:40:40 -0600 Message-ID: <000f01d4406f$6ce8f160$46bad420$@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQHcLsFbwflxxBzVeG17JegIRw3gmAF/6qNxpLx8bJA= Content-Language: en-us Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James, > > The current ghes_edac driver does not update per-dimm error counters > > when reporting memory errors, because there is no = platform-independent > > way to find DIMMs based on the error information provided by = firmware. >=20 > I'd argue there is: its in the CPER records, we just didn't do = anything useful > with the information in the past! Agreed. Will update the wording.=20 =20 > > +static int ghes_edac_dimm_index(u16 handle) { > > + struct mem_ctl_info *mci; > > + int i; > > + > > + if (!ghes_pvt) > > + return -1; >=20 > ghes_edac_report_mem_error() already checked this, as its the only = caller > there is no need to check it again. Will remove. =20 >=20 > > + mci =3D ghes_pvt->mci; > > + > > + if (!mci) > > + return -1; >=20 > Can this happen? ghes_edac_report_mem_error() would have > dereferenced this already! >=20 > If you need the struct mem_ctl_info, you may as well pass it in as the = only > caller has it to hand. Will remove. >=20 > > + > > + for (i =3D 0; i < mci->tot_dimms; i++) { > > + if (mci->dimms[i]->smbios_handle =3D=3D handle) > > + return i; > > + } > > + return -1; > > +} > > + > > static void ghes_edac_dmidecode(const struct dmi_header *dh, void > > *arg) { > > struct ghes_edac_dimm_fill *dimm_fill =3D arg; @@ -177,6 +197,8 @@ > > static void ghes_edac_dmidecode(const struct dmi_header *dh, void = *arg) > > entry->total_width, entry->data_width); > > } > > > > + dimm->smbios_handle =3D entry->handle; >=20 > We aren't checking for duplicate handles, (e.g. they're all zero). I = think this is > fine as chances are firmware on those systems won't set > CPER_MEM_VALID_MODULE_HANDLE. If it does, the handle it gives us is > ambiguous, and we pick a dimm, instead of whine-ing about broken > firmware tables. >=20 > (I'm just drawing attention to it in case someone disagrees) SBMIOS tables are typically automatically generated so chances for = duplicate handles are small.=20 >=20 > > dimm_fill->count++; > > } > > } > > @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev, > struct cper_sec_mem_err *mem_err) > > p +=3D sprintf(p, "bit_pos:%d ", mem_err->bit_pos); > > if (mem_err->validation_bits & > CPER_MEM_VALID_MODULE_HANDLE) { > > const char *bank =3D NULL, *device =3D NULL; > > + int index =3D -1; > > + > > dmi_memdev_name(mem_err->mem_dev_handle, &bank, > &device); >=20 > > + p +=3D sprintf(p, "DIMM DMI handle: 0x%.4x ", > > + mem_err->mem_dev_handle); > > if (bank !=3D NULL && device !=3D NULL) > > p +=3D sprintf(p, "DIMM location:%s %s ", bank, device); > > - else > > - p +=3D sprintf(p, "DIMM DMI handle: 0x%.4x ", > > - mem_err->mem_dev_handle); >=20 > Why do we now print the handle every time? The handle is pretty > meaningless, it can only be used to find the location-strings, if we = get those > we print them instead. For ghes_edac the bank/device is informational, and nothing would go = wrong if the bank/device numbers are the same as another entry. But the handle is now critical for DIMM lookup, thus pull it out. Thanks! Fan