Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp5025483pxb; Tue, 28 Sep 2021 09:04:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqk9/kvWcw4Fma1LPvsroOS7mfj/RB9Nz9s5IjLfZ66NtcF08H1zEuNS6K51QAxa2XijPx X-Received: by 2002:a19:f515:: with SMTP id j21mr6299814lfb.125.1632845076672; Tue, 28 Sep 2021 09:04:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632845076; cv=none; d=google.com; s=arc-20160816; b=Xr0sZ4FbsFeRAOshMEpx6vOaCYLcOOQ/43JustN86oQkjYTgHtiAjHPhF/6V5gHyFE XNgkHB0SPiPNXwUwSJ6ljNyt2Z0jvkNC8s4aviATFTejaYPF7YJMqGMHc/kkScGJji6E ZFzRBn4dpbVkGLMX3Y476qFcOLzRuWnAn5s50B2jEpdIfy4N2v6DJzSwr1cO/Cbd0IVz IKAFZFn24XvRRwCQdMNqCBw/EFs8GQBJBjmbMtq189Ua0wCq60XTxq4Hk1BvzLRemjF1 dW4o0BFKheBzWYUGRfDwTarTI7Jbs5/KE7/+TwyXFXtkJ4xISVbxS5SWJDNFdd4UPgav mnxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WEEspjtAPyrlj/WOLfZy8vTMUaFxJ+Va5860dhgqWoQ=; b=Q7vbTGym5Ikg8OYCIqWifZby5Irqdyebt3Amgj07Cd3aFZ90CmAcqz/Wk3di1xJ0RI bUCCG/5e/vGr+gHNquaYPULTxwDDqJDvE8dyNpGvZ9UY6B6/8PuB1FtT4jyWUkptLlyI M53JrsooEG6NBh5f903XZzBeqoX4J83HI9wuhCC2COhHPF5QpPo2ysC/IUsl92zREa2K cw4osUild4WUohS2Ub+34f4NBgLI/95L2xlYCHnJ4c4jnLH+wV/bBscAE2QcbEr4HhyE 3x8RUR4u4GXAcWU5VmH7vI6CXk2EMwL3FH2rWSu7jHymlWq/WXSZ5xHNseNMTc5sfY6w 4bMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=i2IOdXcH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 3si2471406ejl.781.2021.09.28.09.04.07; Tue, 28 Sep 2021 09:04:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=i2IOdXcH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241658AbhI1QDv (ORCPT + 99 others); Tue, 28 Sep 2021 12:03:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:45422 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241568AbhI1QDv (ORCPT ); Tue, 28 Sep 2021 12:03:51 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 65D1E61206; Tue, 28 Sep 2021 16:02:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632844931; bh=z3QOuMCLwE8gCMMYL+YHNc3qW3u42YxmsLNVQ7oHBVQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i2IOdXcHTnH0RlGxG9smsvm1+7JNZAeK761Ua6BqIBaQuvaQtxcz7J+5F63uD1YnI L7cijgmunHj6uks1/npCBIKtJvnN84FwnIwFQY+z+UaqUDV/J4jQcVROLolV58uoSs hvQ0E4rDpZn0zdqMOYvIVHrDEV0C80oGEZuV7xUdn7o7i5U6NRKZD727wwL1+33N2f GiMTNddd/No1k7cqSD56sx/VtvaoJuW2pODlo51hGUqStlV6bL07S2qfZW/BD1sKIR /YTpV5dgLuksPuWLQu6OHNlfPkaQPcFSo3qcPDvcL3OhWA82MeeB2xQdb+DeNnHzlM niXAHiDAIWVmw== Received: by mail.kernel.org with local (Exim 4.94.2) (envelope-from ) id 1mVFYX-000Y0y-DL; Tue, 28 Sep 2021 18:02:09 +0200 From: Mauro Carvalho Chehab To: Linux Doc Mailing List , Greg Kroah-Hartman Cc: Mauro Carvalho Chehab , "Borislav Petkov" , Andi Kleen , "Jonathan Corbet" , linux-kernel@vger.kernel.org Subject: [PATCH 1/2] ABI: sysfs-mce: add a new ABI file Date: Tue, 28 Sep 2021 18:02:02 +0200 Message-Id: <91031a1005f014899554e5079e6d859f00473fb7.1632844726.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: Mauro Carvalho Chehab Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Reduce the gap of missing ABIs for Intel servers with MCE by adding a new ABI file. The contents of this file comes from: Documentation/x86/x86_64/machinecheck.rst Cc: Andi Kleen Signed-off-by: Mauro Carvalho Chehab --- To mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH 0/2] at: https://lore.kernel.org/all/cover.1632844726.git.mchehab+huawei@kernel.org/ Documentation/ABI/testing/sysfs-mce | 107 ++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-mce diff --git a/Documentation/ABI/testing/sysfs-mce b/Documentation/ABI/testing/sysfs-mce new file mode 100644 index 000000000000..d0f5095da08b --- /dev/null +++ b/Documentation/ABI/testing/sysfs-mce @@ -0,0 +1,107 @@ +What: /sys/devices/system/machinecheck/machinecheckX/ +Contact: Andi Kleen +When: Feb, 2007 +Description: + (X = CPU number) + + Machine checks report internal hardware error conditions + detected by the CPU. Uncorrected errors typically cause a + machine check (often with panic), corrected ones cause a + machine check log entry. + + For more details about the x86 machine check architecture + see the Intel and AMD architecture manuals from their + developer websites. + + For more details about the architecture + see http://one.firstfloor.org/~andi/mce.pdf + + Each CPU has its own directory. + +What: /sys/devices/system/machinecheck/machinecheckX/bank +Contact: Andi Kleen +When: Feb, 2007 +Description: + (Y bank number) + + 64bit Hex bitmask enabling/disabling specific subevents for + bank Y. + + When a bit in the bitmask is zero then the respective + subevent will not be reported. + + By default all events are enabled. + + Note that BIOS maintain another mask to disable specific events + per bank. This is not visible here + +What: /sys/devices/system/machinecheck/machinecheckX/check_interval +Contact: Andi Kleen +When: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + How often to poll for corrected machine check errors, in + seconds (Note output is hexadecimal). Default 5 minutes. + When the poller finds MCEs it triggers an exponential speedup + (poll more often) on the polling interval. When the poller + stops finding MCEs, it triggers an exponential backoff + (poll less often) on the polling interval. The check_interval + variable is both the initial and maximum polling interval. + 0 means no polling for corrected machine check errors + (but some corrected errors might be still reported + in other ways) + +What: /sys/devices/system/machinecheck/machinecheckX/tolerant +Contact: Andi Kleen +When: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + Tolerance level. When a machine check exception occurs for a + non corrected machine check the kernel can take different + actions. + + Since machine check exceptions can happen any time it is + sometimes risky for the kernel to kill a process because it + defies normal kernel locking rules. The tolerance level + configures how hard the kernel tries to recover even at some + risk of deadlock. Higher tolerant values trade potentially + better uptime with the risk of a crash or even corruption + (for tolerant >= 3). + + == =========================================================== + 0 always panic on uncorrected errors, log corrected errors + 1 panic or SIGBUS on uncorrected errors, log corrected errors + 2 SIGBUS or log uncorrected errors, log corrected errors + 3 never panic or SIGBUS, log all errors (for testing only) + == =========================================================== + + Default: 1 + + Note this only makes a difference if the CPU allows recovery + from a machine check exception. Current x86 CPUs generally + do not. + +What: /sys/devices/system/machinecheck/machinecheckX/trigger +Contact: Andi Kleen +When: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + Program to run when a machine check event is detected. + This is an alternative to running mcelog regularly from cron + and allows to detect events faster. + +What: /sys/devices/system/machinecheck/machinecheckX/monarch_timeout +Contact: Andi Kleen +When: Feb, 2007 +Description: + How long to wait for the other CPUs to machine check too on a + exception. 0 to disable waiting for other CPUs. + + Unit: us + -- 2.31.1