Received: by 10.192.165.156 with SMTP id m28csp41274imm; Tue, 10 Apr 2018 16:00:15 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+BBzlftZdimdvMyH5V8P2IZ4TDbsZ5pe7Vp4hL4OimZiYBn2Flukr/5poSle1sY30uFCA9 X-Received: by 2002:a17:902:28c3:: with SMTP id f61-v6mr2386439plb.114.1523401215906; Tue, 10 Apr 2018 16:00:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523401215; cv=none; d=google.com; s=arc-20160816; b=WZmTU7+BrxZMQpnkGcTLvxS98bVzvl2hCdXYNpupVz7GumZoIg06pevk0TO7DbIHwr pOK4mTo7yzPCAHLUhSfpvjpMgxz/jwsI0Hrowq8iq9gxu7wWbbkWtX3TLnzz1fxN+D09 AsvLY1qMcZDrFB33wP3aH6/iVKTycSi8idIK7VMnwOYIl8VPmRulRvR6XseC6PuYJXS9 75Hp9ZtWT3M+XwH35QI8SVlbdC+Yra23Iuz1V3uEjOfibAUV65JQKPYjrQRWsMHzLGzD pn9YaRMgggPrZ6oNkA1lQPp68royjEeS+b6BwCIC0K0UHa92kENROrrAOjRCyrFGGpoJ pX/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=gjVlv/KpgaYVyB3tm/NQTSUhcJi6lkEV6G+Yp1EiPhU=; b=GGgU9VmKNKpr1Xm32KZLKEBKtntiHTdvhhAeUqJD4mCC0NdEKdNpwCSxNJHd2uc8c+ xWyS1PxhkTyGJ3Iwp4f95Foa09gvGCszRYTq9s6R5zZcs5WaBeGXPhHU4+eBkmIr84wX 98qGpZbHISg2KkkEOcXGCMiAHkaqpD0o0Vl1n4ijLVHz3siyPPjvVEnQgtLqj68AFxSW oay2QDpdTF0p9axD7f7IRSN46hj58t/+qm84O5mg0l86gRsVDmPRqfKZ5mJPN7VWqY5w t5SGOttdLYmTTRu+yFXdv6gFQC7ftfYrHDXqXzHTy/9+/WW9OoJV+gpSEmAwqkEvMvP9 go6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z6-v6si3385528plo.739.2018.04.10.15.59.39; Tue, 10 Apr 2018 16:00:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756078AbeDJWv6 (ORCPT + 99 others); Tue, 10 Apr 2018 18:51:58 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:43988 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755876AbeDJWh6 (ORCPT ); Tue, 10 Apr 2018 18:37:58 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id D1716D00; Tue, 10 Apr 2018 22:37:57 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Emanuel Czirai , Borislav Petkov , Thomas Gleixner , Ashok Raj , Tom Lendacky Subject: [PATCH 4.14 096/138] x86/microcode: Fix CPU synchronization routine Date: Wed, 11 Apr 2018 00:24:46 +0200 Message-Id: <20180410212913.321948618@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180410212902.121524696@linuxfoundation.org> References: <20180410212902.121524696@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Borislav Petkov commit bb8c13d61a629276a162c1d2b1a20a815cbcfbb7 upstream. Emanuel reported an issue with a hang during microcode update because my dumb idea to use one atomic synchronization variable for both rendezvous - before and after update - was simply bollocks: microcode: microcode_reload_late: late_cpus: 4 microcode: __reload_late: cpu 2 entered microcode: __reload_late: cpu 1 entered microcode: __reload_late: cpu 3 entered microcode: __reload_late: cpu 0 entered microcode: __reload_late: cpu 1 left microcode: Timeout while waiting for CPUs rendezvous, remaining: 1 CPU1 above would finish, leave and the others will still spin waiting for it to join. So do two synchronization atomics instead, which makes the code a lot more straightforward. Also, since the update is serialized and it also takes quite some time per microcode engine, increase the exit timeout by the number of CPUs on the system. That's ok because the moment all CPUs are done, that timeout will be cut short. Furthermore, panic when some of the CPUs timeout when returning from a microcode update: we can't allow a system with not all cores updated. Also, as an optimization, do not do the exit sync if microcode wasn't updated. Reported-by: Emanuel Czirai Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Tested-by: Emanuel Czirai Tested-by: Ashok Raj Tested-by: Tom Lendacky Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.de Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/microcode/core.c | 68 +++++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 27 deletions(-) --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -517,7 +517,29 @@ static int check_online_cpus(void) return -EINVAL; } -static atomic_t late_cpus; +static atomic_t late_cpus_in; +static atomic_t late_cpus_out; + +static int __wait_for_cpus(atomic_t *t, long long timeout) +{ + int all_cpus = num_online_cpus(); + + atomic_inc(t); + + while (atomic_read(t) < all_cpus) { + if (timeout < SPINUNIT) { + pr_err("Timeout while waiting for CPUs rendezvous, remaining: %d\n", + all_cpus - atomic_read(t)); + return 1; + } + + ndelay(SPINUNIT); + timeout -= SPINUNIT; + + touch_nmi_watchdog(); + } + return 0; +} /* * Returns: @@ -527,30 +549,16 @@ static atomic_t late_cpus; */ static int __reload_late(void *info) { - unsigned int timeout = NSEC_PER_SEC; - int all_cpus = num_online_cpus(); int cpu = smp_processor_id(); enum ucode_state err; int ret = 0; - atomic_dec(&late_cpus); - /* * Wait for all CPUs to arrive. A load will not be attempted unless all * CPUs show up. * */ - while (atomic_read(&late_cpus)) { - if (timeout < SPINUNIT) { - pr_err("Timeout while waiting for CPUs rendezvous, remaining: %d\n", - atomic_read(&late_cpus)); - return -1; - } - - ndelay(SPINUNIT); - timeout -= SPINUNIT; - - touch_nmi_watchdog(); - } + if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC)) + return -1; spin_lock(&update_lock); apply_microcode_local(&err); @@ -558,15 +566,22 @@ static int __reload_late(void *info) if (err > UCODE_NFOUND) { pr_warn("Error reloading microcode on CPU %d\n", cpu); - ret = -1; - } else if (err == UCODE_UPDATED) { + return -1; + /* siblings return UCODE_OK because their engine got updated already */ + } else if (err == UCODE_UPDATED || err == UCODE_OK) { ret = 1; + } else { + return ret; } - atomic_inc(&late_cpus); - - while (atomic_read(&late_cpus) != all_cpus) - cpu_relax(); + /* + * Increase the wait timeout to a safe value here since we're + * serializing the microcode update and that could take a while on a + * large number of CPUs. And that is fine as the *actual* timeout will + * be determined by the last CPU finished updating and thus cut short. + */ + if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC * num_online_cpus())) + panic("Timeout during microcode update!\n"); return ret; } @@ -579,12 +594,11 @@ static int microcode_reload_late(void) { int ret; - atomic_set(&late_cpus, num_online_cpus()); + atomic_set(&late_cpus_in, 0); + atomic_set(&late_cpus_out, 0); ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask); - if (ret < 0) - return ret; - else if (ret > 0) + if (ret > 0) microcode_check(); return ret;