Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp6466708ybv; Wed, 12 Feb 2020 12:49:11 -0800 (PST) X-Google-Smtp-Source: APXvYqx0WI/zbyu7hrbAlWWSwpShQkS1bFjQhN32G7vUcBkGFlD9NZiOPvB6SgADx3AlxCLF8IzT X-Received: by 2002:aca:44d7:: with SMTP id r206mr670840oia.33.1581540551017; Wed, 12 Feb 2020 12:49:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581540551; cv=none; d=google.com; s=arc-20160816; b=T/Xx6Cq2aXzWegqsHeYA8v7UqcVaz6MSVZ8BC4bFE5rickzItHaLq4VA2qguV/I9Mb KMGfxq0G9OfUY9fTK/pXr08bjcGz8lM29AQz6U5049vqDRMvilCMOqoWHq1EllW9q+Iy /s6Yhijy3h2eNa7HDQ+gfOzqmUMy5Z//lgW9Wto6vLs3uAeFuEG3A4E0W5KMlS4qWAXE gvIvrCr1q7Pr4QPOikYopS1poxcremjdRgkaraBeRb+GdIsGbVasqgS68Up4szsKheF2 B87vCTr2kMvEoHy1x0H5cmI6jCdG5gMkd1XE5faGD82C+3YLHfVlNjD/8H/sC0jQBeIx 8WQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=ZkpA/K7YLfkpyUl+6Hozzt8PUn0iyU+eb+Jk18VH9pE=; b=Tm9e8Y5MPN/BinjQg39/JE3UnMwlvnroqUdyhZQMCuwEHlrmFX39H7h1LmqVQOX7Gs KlyQKnRrl27FBVZE9d50X5EUon9J1PIXJ5lJap9DA5LG8E3wWpHfjgjOWvlsCPg4Q8Ep Us8K2tjp7JrM/ijwQNTJ/MQgo1H+mPQVLDPRQDj/Je4gOcU9JXiCWiwx/YaXMoiC3LKZ NY1719u0jEGUeDm5B7v3Kb5HHM04FrWG2FrssNg2J7xWU0iNiRY9AMf4VkXZTI5LH3Pi kK6L7DdhDewBZKkxQTN0c5sMv7x6oQgJINX2nolYoAbEtHXR3XPSWSmZ2Y75QH7Jl7rj h5Ew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p19si753751otk.251.2020.02.12.12.48.57; Wed, 12 Feb 2020 12:49:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728570AbgBLUqx (ORCPT + 99 others); Wed, 12 Feb 2020 15:46:53 -0500 Received: from mga03.intel.com ([134.134.136.65]:27753 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727111AbgBLUqx (ORCPT ); Wed, 12 Feb 2020 15:46:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Feb 2020 12:46:52 -0800 X-IronPort-AV: E=Sophos;i="5.70,434,1574150400"; d="scan'208";a="281335102" Received: from agluck-desk2.sc.intel.com ([10.3.52.68]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Feb 2020 12:46:52 -0800 From: Tony Luck To: Borislav Petkov Cc: Tony Luck , x86@kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 0/5] New way to track mce notifier chain actions Date: Wed, 12 Feb 2020 12:46:47 -0800 Message-Id: <20200212204652.1489-1-tony.luck@intel.com> X-Mailer: git-send-email 2.21.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is just a skeleton of how it might look. Several issues arose while looking at this ... not all directly related to the problem at hand. Parts 1 & 2 are just cleanup. CEC should follow the same rules as everyone else who wants to be on the mce notifier chain. No real reason for it to have direct hooks into mce/core.c Part 3 adds a field to struct mce, and defines the BIT fields for each class of notifier. All EDAC drivers share the same BIT since only one of them should be active. Part 4 is where things are interesting and need a great deal more thought. A bunch of things on the chain return NOTIFY_STOP which prevents anything else on the chain from being run. For the moment I ignored that semantic and added code everywhere to set the BIT even though nobody else will see it. This is because I think at least some of them should NOT be NOTIFY_STOP. Part 5 is currently written to always call __print_mce() for debugging. The "if (1 || ...)" obviously doesn't want the "1" (though I'd like to add some /sys knob to flip a switch to force printing for systems where something weird is happening and logs are being lost). Tony Luck (5): x86/mce: Rename "first" function as "early" x86/mce: Convert corrected error collector to use mce notifier x86/mce: Add new "handled" field to "struct mce" x86/mce: Fix all mce notifiers to update the mce->handled bitmask x86/mce: Change default mce logger to check mce->handled arch/x86/include/asm/mce.h | 15 ++++---- arch/x86/include/uapi/asm/mce.h | 9 +++++ arch/x86/kernel/cpu/mce/core.c | 53 +++++++--------------------- arch/x86/kernel/cpu/mce/dev-mcelog.c | 1 + drivers/acpi/acpi_extlog.c | 1 + drivers/acpi/nfit/mce.c | 1 + drivers/edac/i7core_edac.c | 1 + drivers/edac/mce_amd.c | 5 ++- drivers/edac/pnd2_edac.c | 1 + drivers/edac/sb_edac.c | 1 + drivers/edac/skx_common.c | 1 + drivers/ras/cec.c | 29 +++++++++++++++ 12 files changed, 69 insertions(+), 49 deletions(-) -- 2.21.1