Received: by 2002:ac0:cd04:0:0:0:0:0 with SMTP id w4csp199014imn; Fri, 1 Jul 2022 13:03:27 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tIQyrKtlkwtxxQURr+mWIcHBzCaxKlMj1VcixpigMzbJRXq13XSXZfF7JVQ9aZihbSsPRy X-Received: by 2002:a17:907:1693:b0:726:4322:c330 with SMTP id hc19-20020a170907169300b007264322c330mr16231612ejc.9.1656705807538; Fri, 01 Jul 2022 13:03:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656705807; cv=none; d=google.com; s=arc-20160816; b=qL1iHe3xLv+ng2XrEYXPtVoVJTaJeFfoFZ8GMlsQAbLCSsc5Gypi1ySIxP2ZHDcRg5 j1ibNp8/+QSWJXQVtKYtKoZo4nIdm3+KNvqvzNQEppF1YGtJET6aKwjPdZYHkTxeiwpK aNokKJHz2Zl4XrSNPEdSC6eRyo/3fQkLjL48oz7876p4AGhd6FoR3WQXwVrHH82BQIt0 ceJ5agCzUOKV4kMob+oCfM72sUdYaa++0hvyGtWt+s1fjc5WVBBjvkON95or+Aa2mL6T n5Ywi+AAo/i3NsYezMR8HgeUUQO06bkR2kbLK3SgksObX4dW50thBiQTZRFs0h+dat8r RRbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lJEoE/TlUxdXx17Iv4K67mrS4o2dvoznrIU1bRumCwk=; b=pOhchw1Uj3/PXTq4DLwkbZMMsvbXrxo2cE8SHuyWCCgLqq83vS2e1WZN06hVUJ9OSX 7Ls+Y8WH5+xmH4BsPJKkj8qSKiNt8u2zduOpp/8eO2quAB+vXIXh19GHk5EdJkF6jbiV w73ZgqQoJEetI9iLra3zcK+oF4Ht+6ynrLctCMVnZ4dlrqIHBwiKGp9yKlltaDmLpCf4 ngokVn09K+7czgRUIdHvq9SRUqAYsnAzqzRZy90DBViCq0PTp2V/o2y1PJpWQUt/5Yzb QIWNz7ysOyF+lvO1Zl3HICuXhpV7siL5OXB3TLcUVa/hUYj3veh6LPguxAAMx/7DEKnv S8jQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="B/v4A8o1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k20-20020a508ad4000000b004357f5be8besi9663566edk.204.2022.07.01.13.02.57; Fri, 01 Jul 2022 13:03:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="B/v4A8o1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230144AbiGATM7 (ORCPT + 99 others); Fri, 1 Jul 2022 15:12:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229503AbiGATM6 (ORCPT ); Fri, 1 Jul 2022 15:12:58 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 191383C716 for ; Fri, 1 Jul 2022 12:12:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656702778; x=1688238778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7Ji9e0hAxIJNMDuh9KYE+lqOyY+7TlQzjmXgYGpP8ic=; b=B/v4A8o1Nq4uHWX/CwpEYPpszms6RkYDUKN/vAQ3HPQU6gsn4bnMCfNU /xBHXsGXvl/GW+uButkwndsq4JvWfhRzhDBjKV8Uq7gP5ugof/0ebQ2bB BkaL5xhIM/DuZc9aB6fxOI5maENSHq1W54Ca3qSJvEglPlUlAIRbV+UHG 2OahCQooE/7aRdbAjW6qcDWUHVto2rCsEyeAXPit3mAr7Se7yMYVYPB4k N3q2y2XlxeVJMEbCY9NqIt2W4Nt7Pg9CgjRNqlzdZDGEXrlwDihRA79rI dmgELMl/MaDj04yXRTBFHKlFHwIRVTfYYnXwdOOYDeCfpLrwuPlmLYasu A==; X-IronPort-AV: E=McAfee;i="6400,9594,10395"; a="271497870" X-IronPort-AV: E=Sophos;i="5.92,238,1650956400"; d="scan'208";a="271497870" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2022 12:12:44 -0700 X-IronPort-AV: E=Sophos;i="5.92,238,1650956400"; d="scan'208";a="596366381" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2022 12:12:44 -0700 From: Tony Luck To: yazen.ghannam@amd.com Cc: tony.luck@intel.com, bp@alien8.de, linux-kernel@vger.kernel.org, patches@lists.linux.dev, x86@kernel.org Subject: [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems Date: Fri, 1 Jul 2022 12:12:39 -0700 Message-Id: <20220701191239.619940-1-tony.luck@intel.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A large scale study of memory errors on Intel systems in data centers showed that aggressively taking pages with corrected errors offline is the best strategy of using corrected errors as a predictor of future uncorrected errors. It is unknown whether this would help other vendors. There are some indicators that it would not. Set the threshold to "2" on Intel systems. Do-not-apply-without-agreement-from-AMD Signed-off-by: Tony Luck --- drivers/ras/cec.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index 42f2fc0bc8a9..b1fc193b2036 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -556,6 +556,14 @@ static int __init cec_init(void) if (ce_arr.disabled) return -ENODEV; + /* + * Intel systems may avoid uncorreectable errors + * if pages with corrected errors are aggresively + * taken offline. + */ + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + action_threshold = 2; + ce_arr.array = (void *)get_zeroed_page(GFP_KERNEL); if (!ce_arr.array) { pr_err("Error allocating CE array page!\n"); -- 2.35.3