Received: by 10.223.185.116 with SMTP id b49csp1042945wrg; Wed, 21 Feb 2018 11:05:14 -0800 (PST) X-Google-Smtp-Source: AH8x226b6nJDmSvYpHMw3UmHYuo3LkZbmnSLjWhv6Cwwx84pHlt03Lt8MO575ieH5GuicLuTQTSx X-Received: by 10.99.126.17 with SMTP id z17mr3509092pgc.218.1519239913895; Wed, 21 Feb 2018 11:05:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519239913; cv=none; d=google.com; s=arc-20160816; b=yxfBv7aUYuNFlclwXXMckjIhv51FkQDmOwuP51PrTsQgUkcfSlOPvWAVACso4SqDit QQP0skj7SCoTxFZjOwJgxkuNLsxOuNBrAvwdBXZQm40qJa82LlUI4mxIV6u7hOSvQWL/ i3Q6ReObu1VrzefRFn75rD1SGuqXoSolSZV3RMqY6wavtSsrEIo0C44QisvFo6XHBkHF I48/cXQj0nRRR7c09zel3Cd15BR5qyrNx5SgKI65JRwZ1GBdywZ8849nycYi1wME2Ego UurIi52Hcbc4H9F3r8m5TnGXukH8J4SWmdMI0wOkaBxDro9+qg/iGiTKLL9R7hL/Crle S56w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=2KKlmbyxmkw41B98Z2beaisrnUNRzHEk+sZVzhP963M=; b=0g/xa7G/0fDuUhXsDnXGbeBY/SNy1UPczgefKEJ6ERpOo8tEIgI6QOKZhSZyLc00r2 vH40lEn4skMUUr4JeHyKftdwFF0ZHIpJPk0hOdmJ0fgJAwpBC10dYZQEX8HFIWe+Frmr 2+FQZjRUJkDFtylZv9DdBPZJnJWpM4rvTVh59LUWEWTm50rGG28b7DUrGMOFXJlRuxXe z9NpVdYlMYGQwmYpkFYGFuW/j1MemqRT3uzBl2EASy8ydnrpTwztdBTuS8CVbFJtrOLJ fr7uW75SFXqSQNZX8aEtiNz+f1U/q+OsjBfELQ0rURnZgXmFUHGXCRavWsCxvha0OnD8 IMoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o17si1870221pgc.665.2018.02.21.11.04.59; Wed, 21 Feb 2018 11:05:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S937936AbeBUQuP (ORCPT + 99 others); Wed, 21 Feb 2018 11:50:15 -0500 Received: from mga07.intel.com ([134.134.136.100]:58836 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937575AbeBUQt4 (ORCPT ); Wed, 21 Feb 2018 11:49:56 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Feb 2018 08:49:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,545,1511856000"; d="scan'208";a="20034215" Received: from otc-nc-03.jf.intel.com ([10.54.39.38]) by orsmga006.jf.intel.com with ESMTP; 21 Feb 2018 08:49:54 -0800 From: Ashok Raj To: bp@suse.de Cc: Ashok Raj , X86 ML , LKML , Tom Lendacky , Thomas Gleixner , Ingo Molnar , Tony Luck , Andi Kleen , Arjan Van De Ven Subject: [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. Date: Wed, 21 Feb 2018 08:49:44 -0800 Message-Id: <1519231784-9941-4-git-send-email-ashok.raj@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1519231784-9941-1-git-send-email-ashok.raj@intel.com> References: <1519231784-9941-1-git-send-email-ashok.raj@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Microcode updates during OS load always assumed the other hyperthread was "quiet", but Linux never really did this. We've recently received several issues on this, where things did not go well at scale deployments, and the Intel microcode team asked us to make sure the system is in a quiet state during these updates. Such updates are rare events, so we use stop_machine() to ensure the whole system is quiet. Signed-off-by: Ashok Raj Cc: X86 ML Cc: LKML Cc: Tom Lendacky Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Tony Luck Cc: Andi Kleen Cc: Boris Petkov Cc: Arjan Van De Ven --- arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- arch/x86/kernel/cpu/microcode/intel.c | 1 + 2 files changed, 98 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index aa1b9a4..af0aeb2 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -31,6 +31,9 @@ #include #include #include +#include +#include +#include #include #include @@ -489,19 +492,82 @@ static void __exit microcode_dev_exit(void) /* fake device for request_firmware */ static struct platform_device *microcode_pdev; -static enum ucode_state reload_for_cpu(int cpu) +static struct ucode_update_param { + spinlock_t ucode_lock; + atomic_t count; + atomic_t errors; + atomic_t enter; + int timeout; +} uc_data; + +static void do_ucode_update(int cpu, struct ucode_update_param *ucd) { - struct ucode_cpu_info *uci = ucode_cpu_info + cpu; - enum ucode_state ustate; + enum ucode_state retval = 0; - if (!uci->valid) - return UCODE_OK; + spin_lock(&ucd->ucode_lock); + retval = microcode_ops->apply_microcode(cpu); + spin_unlock(&ucd->ucode_lock); + if (retval > UCODE_NFOUND) { + atomic_inc(&ucd->errors); + pr_warn("microcode update to cpu %d failed\n", cpu); + } + atomic_inc(&ucd->count); +} + +/* + * Wait for upto 1sec for all cpus + * to show up in the rendezvous function + */ +#define MAX_UCODE_RENDEZVOUS 1000000000 /* nanosec */ +#define SPINUNIT 100 /* 100ns */ + +/* + * Each cpu waits for 1sec max. + */ +static int ucode_wait_timedout(int *time_out, void *data) +{ + struct ucode_update_param *ucd = data; + if (*time_out < SPINUNIT) { + pr_err("Not all cpus entered ucode update handler %d cpus missing\n", + (num_online_cpus() - atomic_read(&ucd->enter))); + return 1; + } + *time_out -= SPINUNIT; + touch_nmi_watchdog(); + return 0; +} + +/* + * All cpus enter here before a ucode load upto 1 sec. + * If not all cpus showed up, we abort the ucode update + * and return. ucode update is serialized with the spinlock + */ +static int ucode_load_rendezvous(void *data) +{ + int cpu = smp_processor_id(); + struct ucode_update_param *ucd = data; + int timeout = MAX_UCODE_RENDEZVOUS; + int total_cpus = num_online_cpus(); - ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev, true); - if (ustate != UCODE_OK) - return ustate; + /* + * Wait for all cpu's to arrive + */ + atomic_dec(&ucd->enter); + while(atomic_read(&ucd->enter)) { + if (ucode_wait_timedout(&timeout, ucd)) + return 1; + ndelay(SPINUNIT); + } + + do_ucode_update(cpu, ucd); - return apply_microcode_on_target(cpu); + /* + * Wait for all cpu's to complete + * ucode update + */ + while (atomic_read(&ucd->count) != total_cpus) + cpu_relax(); + return 0; } static ssize_t reload_store(struct device *dev, @@ -509,7 +575,6 @@ static ssize_t reload_store(struct device *dev, const char *buf, size_t size) { enum ucode_state tmp_ret = UCODE_OK; - bool do_callback = false; unsigned long val; ssize_t ret = 0; int cpu; @@ -523,21 +588,37 @@ static ssize_t reload_store(struct device *dev, get_online_cpus(); mutex_lock(µcode_mutex); + /* + * First load the microcode file for all cpu's + */ for_each_online_cpu(cpu) { - tmp_ret = reload_for_cpu(cpu); + tmp_ret = microcode_ops->request_microcode_fw(cpu, + µcode_pdev->dev, true); if (tmp_ret > UCODE_NFOUND) { - pr_warn("Error reloading microcode on CPU %d\n", cpu); + pr_warn("Error reloading microcode file for CPU %d\n", cpu); /* set retval for the first encountered reload error */ if (!ret) ret = -EINVAL; } - - if (tmp_ret == UCODE_UPDATED) - do_callback = true; } + pr_debug("Done loading microcode file for all cpus\n"); - if (!ret && do_callback) + memset(&uc_data, 0, sizeof(struct ucode_update_param)); + spin_lock_init(&uc_data.ucode_lock); + atomic_set(&uc_data.enter, num_online_cpus()); + /* + * Wait for a 1 sec + */ + uc_data.timeout = USEC_PER_SEC; + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); + + pr_debug("Total CPUS = %d uperrors = %d\n", + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); + + if (atomic_read(&uc_data.errors)) + pr_warn("Update failed for %d cpus\n", atomic_read(&uc_data.errors)); + else microcode_check(); mutex_unlock(µcode_mutex); diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 5d32724..0c55be6 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -809,6 +809,7 @@ static enum ucode_state apply_microcode_intel(int cpu) wbinvd(); /* write microcode via MSR 0x79 */ wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); + pr_debug("ucode loading for cpu %d done\n", cpu); rev = intel_get_microcode_revision(); -- 2.7.4