From: Josef Bacik <jbacik@fb.com>
To: <tony.luck@intel.com>, <bp@alien8.de>, <tglx@linutronix.de>,
        <mingo@redhat.com>, <hpa@zytor.com>, <x86@kernel.org>,
        <linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH] mce-apei: do not rely on ACPI_ERST_GET_RECORD_ID for record id
Date: Thu, 3 Mar 2016 14:03:34 -0500
Message-ID: <1457031814-17435-1-git-send-email-jbacik@fb.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2944
Lines: 84

We have a bunch of boxes where ACPI_ERST_GET_RECORD_ID doesn't advance after
reading an error record, it only advances on reboot.  So we get a bunch of MCE's
from a bad CPU, swap out the CPU, and then /var/log/mcelog has a new entry on
every reboot, which makes our monitoring send the box back to be fixed when it
has already been fixed.

So instead of using ACPI_ERST_GET_RECORD_ID to walk through the records on the
box, simply pass 0 into erst_read() which will read the first entry every time,
and then store the record id from the cpe header so that we clear out the right
record.  With this patch these broken boxes now spit out all of records in the
ERST in one go instead of one per reboot.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c | 43 +++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 34c89a3..6de662f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -113,27 +113,50 @@ ssize_t apei_read_mce(struct mce *m, u64 *record_id)
 {
 	struct cper_mce_record rcd;
 	int rc, pos;
+	bool lookup_record = false;
 
 	rc = erst_get_record_id_begin(&pos);
 	if (rc)
 		return rc;
 retry:
-	rc = erst_get_record_id_next(&pos, record_id);
-	if (rc)
-		goto out;
+	/*
+	 * Some hardware is broken and doesn't actually advance the record id
+	 * returned by ACPI_ERST_GET_RECORD_ID when we read a record like the
+	 * spec says it is supposed to.  So instead use record_id == 0 to just
+	 * grab the first record in the erst, and fall back only if we trip over
+	 * a record that isn't a MCE record.
+	 */
+	if (lookup_record) {
+		rc = erst_get_record_id_next(&pos, record_id);
+		if (rc)
+			goto out;
+	} else {
+		*record_id = 0;
+	}
 	/* no more record */
 	if (*record_id == APEI_ERST_INVALID_RECORD_ID)
 		goto out;
 	rc = erst_read(*record_id, &rcd.hdr, sizeof(rcd));
-	/* someone else has cleared the record, try next one */
-	if (rc == -ENOENT)
-		goto retry;
-	else if (rc < 0)
+	/*
+	 * someone else has cleared the record, try next one if we are looking
+	 * up records.  If we aren't looking up the record id's then just bail
+	 * since this means we have an empty table.
+	 */
+	if (rc == -ENOENT) {
+		if (lookup_record)
+			goto retry;
+		rc = 0;
+		goto out;
+	} else if (rc < 0) {
 		goto out;
-	/* try to skip other type records in storage */
-	else if (rc != sizeof(rcd) ||
-		 uuid_le_cmp(rcd.hdr.creator_id, CPER_CREATOR_MCE))
+	} else if (rc != sizeof(rcd) ||
+		 uuid_le_cmp(rcd.hdr.creator_id, CPER_CREATOR_MCE)) {
+		/* try to skip other type records in storage */
+		lookup_record = true;
 		goto retry;
+	}
+	/* Use the record header as the source of truth for the record id. */
+	*record_id = rcd.hdr.record_id;
 	memcpy(m, &rcd.mce, sizeof(*m));
 	rc = sizeof(*m);
 out:
-- 
2.5.0