Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp21185558ybl; Sun, 5 Jan 2020 22:47:27 -0800 (PST) X-Google-Smtp-Source: APXvYqyMWkcTb0At4J4SQsZjrF0kAUt6YUBVM43LhVblLl5y89rdmDCRJ6PbamxB+b8UfdO5bP1T X-Received: by 2002:a9d:4f18:: with SMTP id d24mr111428481otl.179.1578293247251; Sun, 05 Jan 2020 22:47:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578293247; cv=none; d=google.com; s=arc-20160816; b=mlnbBalhJN8vtLl2iH5KYlneTWnd1GQwia/1Ntt2NjXoA82uQtgSyoSWN24x91Zqcc kM7zWWBpLFmubBtGeDtfQuR64+YV66N+V0JRoEegwthv1nzMvaRnFoS3s/v/yKJWk6Oc oxccRcYG2VQiQfDmwbleKFAJ9NJjwnLIV3uJjltYmMBuZYjhWCCnNilSRIL2OODVOFfG 6JXtYKNUvLkqQbMwczbG+E2fESgSp0CwxLRDVqeTB1+Zmh6LO/Z9LiyBWsE4we1ehXIC p5aALtJulL8MilQGGO2hkVXbQc/24LqzTXOBdy+DVjs2CxfNlTLce6NJTS8kULb4Y1Cj IlfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=Y25nhB+K9b/Xz1qFzctpvMLHoXuTEIADkWtTEo4S2MI=; b=jXspMLAwCqPKW7vnX7sk9aXFuSWh358kfeyVYrrgn5/oBP5O1aFGWnncdTOFdJCIRw YVDktupuUJ3LgAaqQ9t2qincrbzXZv136V8t5PP2P3KxTAfpfe/O+aDOzSe7pizm6EjI IRO/AGxHDCfDewfk3ZRglsb8hPDy33Cep83hU7dB40QLEsIf9/NedMfGspHoWSPg6IS9 VcIV5bNAGSgpqxBEjmTXe4Kkeg/BZqwCZ3aTz1A16FnqT/F0SHxqD0/b/vgdfJGPZmMk hH/eqG5842lWJH3Ttu5hAaxEBaqHWPMqkyBR0wey3a9PniG35xE8EjQJXmzwBpgUtBq1 uklg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f189si33003071oib.268.2020.01.05.22.47.14; Sun, 05 Jan 2020 22:47:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726687AbgAFGp0 (ORCPT + 99 others); Mon, 6 Jan 2020 01:45:26 -0500 Received: from mga12.intel.com ([192.55.52.136]:46674 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726368AbgAFGpZ (ORCPT ); Mon, 6 Jan 2020 01:45:25 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jan 2020 22:45:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,401,1571727600"; d="scan'208";a="420613782" Received: from cliu38-mobl3.sh.intel.com (HELO 286ab234718b.sh.intel.com) ([10.239.147.26]) by fmsmga005.fm.intel.com with ESMTP; 05 Jan 2020 22:45:24 -0800 From: Chuansheng Liu To: linux-kernel@vger.kernel.org Cc: tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, chuansheng.liu@intel.com Subject: [PATCH] x86/mce/therm_throt: Fix the access of uninitialized therm_work Date: Mon, 6 Jan 2020 06:41:55 +0000 Message-Id: <20200106064155.64-1-chuansheng.liu@intel.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In ICL platform, it is easy to hit bootup failure with panic in thermal interrupt handler during early bootup stage. Such issue makes my platform almost can not boot up with latest kernel code. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... When one CPU is up, the irq is enabled prior to CPU UP notification which will then initialize therm_worker. Such race will cause the posssibility that interrupt handler therm_throt_process() accesses uninitialized therm_work, then system hit panic at very early bootup stage. In my ICL platform, it can be reproduced in several times of reboot stress. With this fix, the system keeps alive for more than 200 times of reboot stress. Signed-off-by: Chuansheng Liu --- arch/x86/kernel/cpu/mce/therm_throt.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index b38010b541d6..7320eb3ac029 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -86,6 +86,7 @@ struct _thermal_state { unsigned long total_time_ms; bool rate_control_active; bool new_event; + bool therm_work_active; u8 level; u8 sample_index; u8 sample_count; @@ -359,7 +360,9 @@ static void therm_throt_process(bool new_event, int event, int level) state->baseline_temp = temp; state->last_interrupt_time = now; - schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL); + if (state->therm_work_active) + schedule_delayed_work_on(this_cpu, &state->therm_work, + THERM_THROT_POLL_INTERVAL); } else if (old_event && state->last_interrupt_time) { unsigned long throttle_time; @@ -473,7 +476,8 @@ static int thermal_throttle_online(unsigned int cpu) INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work); INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work); - + state->package_throttle.therm_work_active = true; + state->core_throttle.therm_work_active = true; return thermal_throttle_add_dev(dev, cpu); } @@ -482,6 +486,8 @@ static int thermal_throttle_offline(unsigned int cpu) struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); + state->package_throttle.therm_work_active = false; + state->core_throttle.therm_work_active = false; cancel_delayed_work(&state->package_throttle.therm_work); cancel_delayed_work(&state->core_throttle.therm_work); -- 2.17.1