Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp22036565ybl; Mon, 6 Jan 2020 16:44:27 -0800 (PST) X-Google-Smtp-Source: APXvYqwkbmlSAzxL++1TWY+khI3EvC/v5n4Au+01LN1veOYQgh52A16zFIjKvf1/KWAQRGtSAZ1q X-Received: by 2002:a9d:70d9:: with SMTP id w25mr115959191otj.231.1578357867466; Mon, 06 Jan 2020 16:44:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578357867; cv=none; d=google.com; s=arc-20160816; b=fu3RGs01LS/Ualub3+mXkQ0yWWje7vhtQLJnu4Aga21hRGEXSjGmxwOe/Mj5Rx/Z/7 JItRPTCMWpLp0MZBxKIvfgQXXo62y/4ze/MVxZp01fAS6aINsPXVzAXChYBaXl8fKV9p ph8xGz2Pj+vQWF5Tcg8fjqliMmugCJ1L77a8qMdP2CbYr9ENn3vYhzj1oojzAgdkUnma ll/Dg0mtyWZZFkvqsa30o+zTS0hA2v8HYe6Z+Y9oUgtzl3nE+fr/KIa3L7pe4eKiLH+N 7epzD7gbMshbrMkTKG8HCyuQP6gHhqYuL7biUe722SvOkdn33yCnVfZDHg6JhwJuBSSK RujA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=DZxwpznG7SkTe3HLl8YIvngeVFDReDfqR5C7Nkgeefc=; b=jdVG1VhK9xOagAhSqpGS4UCqFTd2yRYhyqssKP0wiCKEa/CZ7cBQh3SQhIWaxCwvWU 3fMRreoM/CN6Epb1ATQy9u6yHBzL7J3ngdKNwRKAuhd2IF62KxqZm3i1V/LVJNR/GdUp tntueGOjuafQph1RD0yTbCHcOf1KNf8zX+ly4YhJ9Av2xxDfZ/zWzXzZUeYXkaNofWK8 vWkHKOebupxHeaI2CRc9clRSSaLqo+TpymIl48x3abXzIg0NX7GclC4yIIwN12fTf5S9 TlhJC+zPSJqvlZPo6imAFa7vT8FxUsyuzhBTxj9R3EHkQaFyMBi6Ak6JAJx3mX3kvhLX znVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t12si36825596otq.53.2020.01.06.16.44.14; Mon, 06 Jan 2020 16:44:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727354AbgAGAnb (ORCPT + 99 others); Mon, 6 Jan 2020 19:43:31 -0500 Received: from mga17.intel.com ([192.55.52.151]:3981 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727250AbgAGAnb (ORCPT ); Mon, 6 Jan 2020 19:43:31 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Jan 2020 16:43:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,404,1571727600"; d="scan'208";a="395186614" Received: from cliu38-mobl3.sh.intel.com (HELO 286ab234718b.sh.intel.com) ([10.239.147.26]) by orsmga005.jf.intel.com with ESMTP; 06 Jan 2020 16:43:29 -0800 From: Chuansheng Liu To: linux-kernel@vger.kernel.org Cc: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, chuansheng.liu@intel.com Subject: [PATCH v2] x86/mce/therm_throt: Fix the access of uninitialized therm_work Date: Tue, 7 Jan 2020 00:41:16 +0000 Message-Id: <20200107004116.59353-1-chuansheng.liu@intel.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In ICL platform, it is easy to hit bootup failure with panic in thermal interrupt handler during early bootup stage. Such issue makes my platform almost can not boot up with latest kernel code. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... When one CPU is up, the irq is enabled prior to CPU UP notification which will then initialize therm_worker. Such race will cause the posssibility that interrupt handler therm_throt_process() accesses uninitialized therm_work, then system hit panic at very early bootup stage. In my ICL platform, it can be reproduced in several times of reboot stress. With this fix, the system keeps alive for more than 200 times of reboot stress. V2: Boris shares a good suggestion that we can moving the interrupt unmasking at the end of therm_work initialization. Signed-off-by: Chuansheng Liu --- arch/x86/kernel/cpu/mce/therm_throt.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index b38010b541d6..528b85664b46 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -467,6 +467,7 @@ static int thermal_throttle_online(unsigned int cpu) { struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); + u32 l; state->package_throttle.level = PACKAGE_LEVEL; state->core_throttle.level = CORE_LEVEL; @@ -474,6 +475,12 @@ static int thermal_throttle_online(unsigned int cpu) INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work); INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work); + /* Unmask the thermal vector after + * therm_works are initialized. + */ + l = apic_read(APIC_LVTTHMR); + apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED); + return thermal_throttle_add_dev(dev, cpu); } @@ -722,10 +729,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c) rdmsr(MSR_IA32_MISC_ENABLE, l, h); wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h); - /* Unmask the thermal vector: */ - l = apic_read(APIC_LVTTHMR); - apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED); - pr_info_once("CPU0: Thermal monitoring enabled (%s)\n", tm2 ? "TM2" : "TM1"); -- 2.17.1