Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758319AbcCCVuA (ORCPT ); Thu, 3 Mar 2016 16:50:00 -0500 Received: from mga02.intel.com ([134.134.136.20]:19083 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751878AbcCCVt7 (ORCPT ); Thu, 3 Mar 2016 16:49:59 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,533,1449561600"; d="scan'208";a="663421439" Date: Thu, 3 Mar 2016 13:49:45 -0800 From: "Luck, Tony" To: Josef Bacik Cc: bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mce-apei: do not rely on ACPI_ERST_GET_RECORD_ID for record id Message-ID: <20160303214945.GA11233@intel.com> References: <1457031814-17435-1-git-send-email-jbacik@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1457031814-17435-1-git-send-email-jbacik@fb.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2015 Lines: 62 > retry: > - rc = erst_get_record_id_next(&pos, record_id); > - if (rc) > - goto out; > + /* > + * Some hardware is broken and doesn't actually advance the record id I'd blame this on firmware rather than hardware. > + * returned by ACPI_ERST_GET_RECORD_ID when we read a record like the > + * spec says it is supposed to. So instead use record_id == 0 to just > + * grab the first record in the erst, and fall back only if we trip over > + * a record that isn't a MCE record. > + */ > + if (lookup_record) { > + rc = erst_get_record_id_next(&pos, record_id); > + if (rc) > + goto out; > + } else { > + *record_id = 0; > + } > /* no more record */ > if (*record_id == APEI_ERST_INVALID_RECORD_ID) > goto out; > rc = erst_read(*record_id, &rcd.hdr, sizeof(rcd)); > - /* someone else has cleared the record, try next one */ > - if (rc == -ENOENT) > - goto retry; > - else if (rc < 0) > + /* > + * someone else has cleared the record, try next one if we are looking > + * up records. If we aren't looking up the record id's then just bail > + * since this means we have an empty table. > + */ > + if (rc == -ENOENT) { > + if (lookup_record) > + goto retry; > + rc = 0; > + goto out; > + } else if (rc < 0) { > goto out; > - /* try to skip other type records in storage */ > - else if (rc != sizeof(rcd) || > - uuid_le_cmp(rcd.hdr.creator_id, CPER_CREATOR_MCE)) > + } else if (rc != sizeof(rcd) || > + uuid_le_cmp(rcd.hdr.creator_id, CPER_CREATOR_MCE)) { > + /* try to skip other type records in storage */ > + lookup_record = true; > goto retry; Are you still doomed by the buggy firmware if we take this "goto"? You be back at the top of the loop excpecting erst_get_record_id_next() to move on. Does this just not happen in practice (finding non MCE records in amognst the MCE ones)? > + } > + /* Use the record header as the source of truth for the record id. */ > + *record_id = rcd.hdr.record_id; > memcpy(m, &rcd.mce, sizeof(*m)); > rc = sizeof(*m); -Tony