Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp72568ybp; Thu, 10 Oct 2019 14:10:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqwnlC2zRbtKxz+uZDte23kBCTC+UuXT1NqnDyPg4eFuVuQab3L77/KT4nfoLS+Xn4EHSm4z X-Received: by 2002:a17:906:edb7:: with SMTP id sa23mr10314906ejb.263.1570741836680; Thu, 10 Oct 2019 14:10:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570741836; cv=none; d=google.com; s=arc-20160816; b=tvdq5HrHIGc++PB+j1t/29HxGrQLV0gA5JYI8iVDGTyoamgLuTUr7CKV2qDStEy3ug yotpFWDeYEx952LAZ7ihrwaWc70OhMTD58ORn0phc6SEsZD09hpBGryM/efsL0rk/AQT KaZVetWhrG+PBkyRnCLMNfV1QKQu1QKTbPMCcvYt9iI+PNwpRwv4c2Umv2MveeowYE/e KHyf6sp0joF5dhTj9LqO2/ZWX4BwhEVWPZfQMeee8My50T3rVmTPnVVHuwx2EIoJeGmV nHlGEl2tqbKlPBg39wExyCevSP6djVl9kCgZKnu2VQ4+zU/7fD2hCTkz8q0pia4qtCzZ wagw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=UKxf0vcK1nuQey6cEv7E+IyB8QRszFWXehGyftV44VI=; b=OYvhJdabb8BOclLHeYnZWyQPuhd8yuGPFKL/peK4NEVHqM9XQtGP7il9yaHlWkEjUh gG+dNaP5c7EieL6/Q4Hp30bgEKnoUiPgQc1gKjooOH01+nxg5uqlav3NncKE6xGL5PGJ +g7DvfnvZNuIvOn8Xj9KPq5y0+G/gI1jpnjzyYZ1WzUtUpU6tV5JbLZmU9OlkcS7bMI2 OeSonsKtLw85s2qe2StnqPC7dbMkfGF6gbtmHXZEwoU//8LcLkjtf1B0YWZDbWXK0G8m JtEz8NqWzTnm1y0PtjmwKGf6r4W9eNOpWh3/2j2ZDRSlFBM+7YJdy3+GYLIOPsI+PkFE cwdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h2si4017356edw.29.2019.10.10.14.10.09; Thu, 10 Oct 2019 14:10:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727259AbfJJVI6 (ORCPT + 99 others); Thu, 10 Oct 2019 17:08:58 -0400 Received: from mga05.intel.com ([192.55.52.43]:44919 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726804AbfJJVI6 (ORCPT ); Thu, 10 Oct 2019 17:08:58 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Oct 2019 14:08:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,281,1566889200"; d="scan'208";a="395535274" Received: from spandruv-mobl3.jf.intel.com ([10.251.26.188]) by fmsmga006.fm.intel.com with ESMTP; 10 Oct 2019 14:08:55 -0700 Message-ID: Subject: Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings From: Srinivas Pandruvada To: Borislav Petkov , Benjamin Berg Cc: linux-kernel@vger.kernel.org, Hans de Goede , Christian Kellner , Tony Luck , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-edac@vger.kernel.org Date: Thu, 10 Oct 2019 14:08:55 -0700 In-Reply-To: <20191009175608.GK10395@zn.tnic> References: <20191009155424.249277-1-bberg@redhat.com> <20191009175608.GK10395@zn.tnic> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-3.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Benjamin, On Wed, 2019-10-09 at 19:56 +0200, Borislav Petkov wrote: > On Wed, Oct 09, 2019 at 05:54:24PM +0200, Benjamin Berg wrote: > > On modern CPUs it is quite normal that the temperature limits are > > reached and the CPU is throttled. In fact, often the thermal design > > is > > not sufficient to cool the CPU at full load and limits can quickly > > be > > reached when a burst in load happens. This will even happen with > > technologies like RAPL limitting the long term power consumption of > > the package. > > > > So these messages do not usually indicate a hardware issue (e.g. > > insufficient cooling). Log them as warnings to avoid confusion > > about > > their severity. > > I have a patch to address this. Instead of avoiding any critical warnings or wait for 300 seconds for next one, the warning is based on how long the system is working on throttled condition. If for example the fan broke, then the throttling is extended for a long time. Then we better warn. I am waiting for internal review, and hope to post by tomorrow. Thanks Srinivas > > Signed-off-by: Benjamin Berg > > Tested-by: Christian Kellner > > --- > > arch/x86/kernel/cpu/mce/therm_throt.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c > > b/arch/x86/kernel/cpu/mce/therm_throt.c > > index 6e2becf547c5..bc441d68d060 100644 > > --- a/arch/x86/kernel/cpu/mce/therm_throt.c > > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c > > @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, > > int event, int level) > > /* if we just entered the thermal event */ > > if (new_event) { > > if (event == THERMAL_THROTTLING_EVENT) > > - pr_crit("CPU%d: %s temperature above threshold, > > cpu clock throttled (total events = %lu)\n", > > + pr_warn("CPU%d: %s temperature above threshold, > > cpu clock throttled (total events = %lu)\n", > > this_cpu, > > level == CORE_LEVEL ? "Core" : > > "Package", > > state->count); > > -- > > This has carried over since its very first addition in > > commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4 > Author: Dave Jones > Date: Tue Apr 2 20:02:27 2002 -0800 > > [PATCH] x86 bluesmoke update. > > o Make MCE compile time optional (Paul Gortmaker) > o P4 thermal trip monitoring. (Zwane Mwaikambo) > o Non-fatal MCE logging. (Me) > > > It used to be KERN_EMERG back then, though. > > And yes, this issue has come up in the past already so I think I'll > take > it. I'll just give Intel folks a couple of days to object should > there > be anything to object to. > > Thx. >