Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1057495rwb; Sat, 13 Aug 2022 15:43:18 -0700 (PDT) X-Google-Smtp-Source: AA6agR7xAIgXRJCUfWERBxEyhsGHXE/whFWptt3YKiFA+tUYgF/4cw3QSEuDdAStsmgpEU7ZHvJ5 X-Received: by 2002:a17:902:cec4:b0:16f:8fdc:954d with SMTP id d4-20020a170902cec400b0016f8fdc954dmr10419250plg.37.1660430598397; Sat, 13 Aug 2022 15:43:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660430598; cv=none; d=google.com; s=arc-20160816; b=oGjrQYZt2RlKWfXCDuX8RZdHnFxKFumom06YgfDCM1fMQiJ+veW+6QU/4ddom0OQ6W /32es1tURoO9kSY3qEwhRk/7tgRR/9T316uKPWaOPCs+E4dCwqSSjMTPwG5htBlqKHSr cTuDrovES/o+E2Gk7rbPwR75Z6rhmdJTsgu4FGvkoVdZ1tCMlTWZrqBfdKSsB+YHnUlC BCGG2lGeCB39vyLVahx5gdx1nFRDfJs8fIc678Hy/oy8qVExkxm0jJ7BiPpEBbSWpp0F sXuV6w/L8e+eryv+eZ/X3iKn1Fj1ccQ5mXQ5kR8ON0lIyJt4D8JMnR92XZ/EiLbdWb1D seTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fvuoL+4Wen2xvfWy0OjUyU3o+FFAno0aUdBcY9Y7Oow=; b=iGVqm4IaHqvkleK+TL0XSs3U82K0zBMtf5ejOZo/q7YxJEZcA7eXQSSWmL1hRTaJ5f tEQItdM10HPmw575enQ9WCOAfanVc44eSLw4SmYZVcr4eCYxjtNfBR8aL4sGUMLIelHq BathEv2ZDsYT2MACO9ismQ7OQ8D1dZFGjz8xGUzLoPcrB+t6DHg6LaP0w1SlhXlGbF1R 5HqTF6dgw571bsZvjpXUc079jz/jvu0Vqwrfhi+SoL0CliJwNipP03Ua7sPHobaEYp/d oxzjFXwYKfVePNBJ4pxtdJk0c/hGxtD8ymGFwzzt+Q8isOIG417pmDkg3QQ9LBvOp8Bl JYyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=c9I8SViJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nn12-20020a17090b38cc00b001f07221b2ddsi11646316pjb.99.2022.08.13.15.43.05; Sat, 13 Aug 2022 15:43:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=c9I8SViJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240072AbiHMWjL (ORCPT + 99 others); Sat, 13 Aug 2022 18:39:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236391AbiHMWiy (ORCPT ); Sat, 13 Aug 2022 18:38:54 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B2C16CD28 for ; Sat, 13 Aug 2022 15:38:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660430333; x=1691966333; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tT+pOZUbPO916ColyGnd0DnVtmEhGeyWUDAruwEPKXk=; b=c9I8SViJxxtP6Zyf/AtRv/6ipon8vEvF/CYF3WCday6FFsDAxHdfVYuE 9ErftCwX2ztrS4dPLMmkehz9l4TqzIFjd283QhDlGvpQ7sbBbh+35TLJ4 8ywFUT7ASnjdzcHHb9AXW59+C2mJcmlufgyWpCKPvtIi8JL+3F1Nnwao6 skCJS4LJSovPmkxcgM5ii3+/1ha0LX7lxccbDc9Je+FKsqbygynRV9u2a +1x5cKJUR4rPzKxzErfrmf9rpBERH43MWisz0mTgOa0Ce1fYK7YXuUrfq M6uamKl7vybKh1uhIa/Fi/GMz+iKUvVRruMFEeLvoDdAg/ivVA1uQlV4u Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10437"; a="353523723" X-IronPort-AV: E=Sophos;i="5.93,236,1654585200"; d="scan'208";a="353523723" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Aug 2022 15:38:49 -0700 X-IronPort-AV: E=Sophos;i="5.93,236,1654585200"; d="scan'208";a="556890238" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Aug 2022 15:38:49 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , Ashok Raj Subject: [PATCH 5/5] x86/microcode: Handle NMI's during microcode update. Date: Sat, 13 Aug 2022 22:38:25 +0000 Message-Id: <20220813223825.3164861-6-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220813223825.3164861-1-ashok.raj@intel.com> References: <20220813223825.3164861-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Microcode updates need a guarantee that the thread sibling that is waiting for the update to finish on the primary core will not execute any instructions until the update is complete. This is required to guarantee any MSR or instruction that's being patched will be executed before the update is complete. After the stop_machine() rendezvous, an NMI handler is registered. If an NMI were to happen while the microcode update is not complete, the secondary thread will spin until the ucode update state is cleared. Couple of choices discussed are: 1. Rendezvous inside the NMI handler, and also perform the update from within the handler. This seemed too risky and might cause instability with the races that we would need to solve. This would be a difficult choice. 2. Thomas (tglx) suggested that we could look into masking all the LVT originating NMI's. Such as LINT1, Perf control LVT entries and such. Since we are in the rendezvous loop, we don't need to worry about any NMI IPI's generated by the OS. The one we didn't have any control over is the ACPI mechanism of sending notifications to kernel for Firmware First Processing (FFM). Apparently it seems there is a PCH register that BIOS in SMI would write to generate such an interrupt (ACPI GHES). 3. This is a simpler option. OS registers an NMI handler and doesn't do any NMI rendezvous dance. But if an NMI were to happen, we check if any of the CPUs thread siblings have an update in progress. Only those CPUs would take an NMI. The thread performing the wrmsr() will only take an NMI after the completion of the wrmsr 0x79 flow. Signed-off-by: Ashok Raj --- arch/x86/kernel/cpu/microcode/core.c | 88 +++++++++++++++++++++++++++- 1 file changed, 85 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index d24e1c754c27..ec10fa2db8b1 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -40,6 +40,7 @@ #include #include #include +#include #define DRIVER_VERSION "2.2" @@ -411,6 +412,10 @@ static int check_online_cpus(void) static atomic_t late_cpus_in; static atomic_t late_cpus_out; +static atomic_t nmi_cpus; +static atomic_t nmi_timeouts; + +static struct cpumask cpus_in_wait; static int __wait_for_cpus(atomic_t *t, long long timeout) { @@ -433,6 +438,53 @@ static int __wait_for_cpus(atomic_t *t, long long timeout) return 0; } +static int ucode_nmi_cb(unsigned int val, struct pt_regs *regs) +{ + int cpu = smp_processor_id(); + int timeout = 100 * NSEC_PER_USEC; + + atomic_inc(&nmi_cpus); + if (!cpumask_test_cpu(cpu, &cpus_in_wait)) + return NMI_DONE; + + while (timeout < NSEC_PER_USEC) { + if (timeout < NSEC_PER_USEC) { + atomic_inc(&nmi_timeouts); + break; + } + ndelay(SPINUNIT); + timeout -= SPINUNIT; + touch_nmi_watchdog(); + if (!cpumask_test_cpu(cpu, &cpus_in_wait)) + break; + } + return NMI_HANDLED; +} + +static void set_nmi_cpus(struct cpumask *wait_mask) +{ + int first_cpu, wait_cpu, cpu = smp_processor_id(); + + first_cpu = cpumask_first(topology_sibling_cpumask(cpu)); + for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) { + if (wait_cpu == first_cpu) + continue; + cpumask_set_cpu(wait_cpu, wait_mask); + } +} + +static void clear_nmi_cpus(struct cpumask *wait_mask) +{ + int first_cpu, wait_cpu, cpu = smp_processor_id(); + + first_cpu = cpumask_first(topology_sibling_cpumask(cpu)); + for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) { + if (wait_cpu == first_cpu) + continue; + cpumask_clear_cpu(wait_cpu, wait_mask); + } +} + /* * Returns: * < 0 - on error @@ -440,7 +492,7 @@ static int __wait_for_cpus(atomic_t *t, long long timeout) */ static int __reload_late(void *info) { - int cpu = smp_processor_id(); + int first_cpu, cpu = smp_processor_id(); enum ucode_state err; int ret = 0; @@ -459,6 +511,7 @@ static int __reload_late(void *info) * the platform is taken to reset predictively. */ mce_set_mcip(); + /* * On an SMT system, it suffices to load the microcode on one sibling of * the core because the microcode engine is shared between the threads. @@ -466,9 +519,17 @@ static int __reload_late(void *info) * loading attempts happen on multiple threads of an SMT core. See * below. */ + first_cpu = cpumask_first(topology_sibling_cpumask(cpu)); - if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu) + /* + * Set the CPUs that we should hold in NMI until the primary has + * completed the microcode update. + */ + if (first_cpu == cpu) { + set_nmi_cpus(&cpus_in_wait); apply_microcode_local(&err); + clear_nmi_cpus(&cpus_in_wait); + } else goto wait_for_siblings; @@ -502,20 +563,41 @@ static int __reload_late(void *info) */ static int microcode_reload_late(void) { - int ret; + int ret = 0; pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n"); pr_err("You should switch to early loading, if possible.\n"); atomic_set(&late_cpus_in, 0); atomic_set(&late_cpus_out, 0); + atomic_set(&nmi_cpus, 0); + atomic_set(&nmi_timeouts, 0); + cpumask_clear(&cpus_in_wait); + + ret = register_nmi_handler(NMI_LOCAL, ucode_nmi_cb, NMI_FLAG_FIRST, + "ucode_nmi"); + if (ret) { + pr_err("Unable to register NMI handler\n"); + goto done; + } ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask); if (ret == 0) microcode_check(); + unregister_nmi_handler(NMI_LOCAL, "ucode_nmi"); + + if (atomic_read(&nmi_cpus)) + pr_info("%d CPUs entered NMI while microcode update" + "in progress\n", atomic_read(&nmi_cpus)); + + if (atomic_read(&nmi_timeouts)) + pr_err("Some CPUs [%d] entered NMI and timedout waiting for its" + " mask to be cleared\n", atomic_read(&nmi_timeouts)); + pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode); +done: return ret; } -- 2.32.0