Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp16977pxu; Tue, 15 Dec 2020 13:24:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJzX4qhfo5OkDL/ba/ZcmtL8JEVNanv0yBESty5TexlX5qmgRrIqwuBqc1DzYySqp0BMG/Eb X-Received: by 2002:a17:906:848:: with SMTP id f8mr3745318ejd.404.1608067449019; Tue, 15 Dec 2020 13:24:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608067449; cv=none; d=google.com; s=arc-20160816; b=kvmp4pqAxvIyOVbogISiKR+gkHyO8vIaYzgFvMgOEPgJpaayIAVTpmd0UO5IiOwcKc kPzmn8BbTa1r/gw8k8aJucU5r0Ikio1PSug+vMhPoLDzxzhq72aJEFeYCGGyR+GpZVbC OFKe32MYYOngWaPmly9oGE4pTBPP3Uvt9tNZyOAJUgjlh1TKB1kucAyvN2nWc5m4H0nv yVq4xFIzIFUYfMIoWHsAlh9ku6U8/Iw3FpnfcYfu8EYAcYHFKgvh0w9OdtV5iDCxx4Se +vOxzQBWz1Q7yk9barTdbyQgiukLSPm5/sYLVi5kzvuMyq7QLxzF1A9abNW/5CiB3MqV ibHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:in-reply-to:subject :cc:to:dkim-signature:dkim-signature:from; bh=Iy5s349Lpqj9j312ocXM5bIRplArGDY7oftUIzClycE=; b=ol6x5VD057m7hEFTHmlFtWlUntLfxjPbvfSzysZWYU2yd8VnlTtLBswppRxgHRfiof OdBWO2Xeql/YJtG9SorOHLwgEejUTCco0QRwMkSsfuY5QNWi3aYU2R6nqGBKIl03ZPms jnrzlUScNFoVNHM97ez6VOFlhP30JG4TfXYWapzQVcn0OQYWzYuuoyjZMD48CNRVM192 XMeh5g64Z0qc1JUe25NmUbxp1TeTN42SkFTCW5dEfx1qCk4nzWDCHZEmESl6Nkshydbm ZshqiPYTNQqquufmYyF8/WSth3EyKEKI/P8C1EmlRGZ5tYfMfl0hyAb/8VFdWEv4v5ed SQqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=PiNeXT8O; dkim=neutral (no key) header.i=@linutronix.de header.b=lSMROPIL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d7si1510948edx.507.2020.12.15.13.23.45; Tue, 15 Dec 2020 13:24:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=PiNeXT8O; dkim=neutral (no key) header.i=@linutronix.de header.b=lSMROPIL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729546AbgLOVUs (ORCPT + 99 others); Tue, 15 Dec 2020 16:20:48 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:34962 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729683AbgLOVUq (ORCPT ); Tue, 15 Dec 2020 16:20:46 -0500 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1608067202; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=Iy5s349Lpqj9j312ocXM5bIRplArGDY7oftUIzClycE=; b=PiNeXT8OHg0oIT0/tiuLBKBCEIPVrGBvXW4MeDYJVe0ZcbNpx2iFfy+sWm3Y2qfdA/1VQW MlxC4qaBal6NkFQnG6BXsDJvA1YS7VqJ3nhX5P+MS304Y/bd7/uWpNXXzlYwQh9JfrtaLN ZhjVXz7dYNAkBmhXXJZiMFqNzoiazFJ35GMda+ST6NPYuKaxFWAMn2nP6BaV6Q4A2d3G7Q I5BU3TlD/QqhGs7yJhUWDHCkDSmNOXa3UcmSx/AO9Y+gdj1mRGbOHqm4TcPmXgCm4NmdmK 1aEfi8zqHx/EtYgjKQbU/HqTgytuBKcAIDrmh9sKpTKTDOqkHy/Z2g1dpjfUUQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1608067202; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=Iy5s349Lpqj9j312ocXM5bIRplArGDY7oftUIzClycE=; b=lSMROPILFdmlUnExlAufkNc2XAxr3TbiTwMmqGvuEfpfWvEI6FPE3t+pwtyfgnfu/427n/ bKsYIDV+3rpkRfBg== To: Andy Lutomirski , "shenkai \(D\)" Cc: LKML , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , hewenliang4@huawei.com, hushiyuan@huawei.com, luolongjun@huawei.com, hejingxian@huawei.com Subject: Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation In-Reply-To: Date: Tue, 15 Dec 2020 22:20:02 +0100 Message-ID: <87eejqu5q5.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 15 2020 at 08:31, Andy Lutomirski wrote: > On Tue, Dec 15, 2020 at 6:46 AM shenkai (D) wrote: >> >> From: shenkai >> Date: Tue, 15 Dec 2020 01:58:06 +0000 >> Subject: [PATCH] use x86 cpu park to speedup smp_init in kexec situation >> >> In kexec reboot on x86 machine, APs will be halted and then waked up >> by the apic INIT and SIPI interrupt. Here we can let APs spin instead >> of being halted and boot APs by writing to specific address. In this way >> we can accelerate smp_init procedure for we don't need to pull APs up >> from a deep C-state. >> >> This is meaningful in many situations where users are sensitive to reboot >> time cost. > > I like the concept. No. This is the wrong thing to do. We are not optimizing for _one_ special case. We can optimize it for all operations where all the non boot CPUs have to brought up, be it cold boot, hibernation resume or kexec. Aside of that this is not a magic X86 special problem. Pretty much all architectures have the same issue and it can be solved very simple, which has been discussed before and I outlined the solution years ago, but nobody sat down and actually made it work. Since the rewrite of the CPU hotplug infrastructure to a state machine it's pretty obvious that the bringup of APs can changed from the fully serialized: for_each_present_cpu(cpu) { if (!cpu_online(cpu)) cpu_up(cpu, CPUHP_ONLINE); } to for_each_present_cpu(cpu) { if (!cpu_online(cpu)) cpu_up(cpu, CPUHP_KICK_CPU); } for_each_present_cpu(cpu) { if (!cpu_active(cpu)) cpu_up(cpu, CPUHP_ONLINE); } The CPUHP_KICK_CPU state does not exist today, but it's just the logical consequence of the state machine. It's basically splitting __cpu_up() into: __cpu_kick() { prepare(); arch_kick_remote_cpu(); -> Send IPI/NMI, Firmware call ..... } __cpu_wait_online() { wait_until_cpu_online(); do_further_stuff(); } There is some more to it than just blindly splitting it up at the architecture level. All __cpu_up() implementations across arch/ have a lot of needlessly duplicated and pointlessly differently implemented code which can move completely into the core. So actually we want to split this further up: CPUHP_PREPARE_CPU_UP: Generic preparation step where all the magic cruft which is duplicated across architectures goes to CPUHP_KICK_CPU: Architecture specific prepare and kick CPUHP_WAIT_ONLINE: Generic wait function for CPU coming online: wait_for_completion_timeout() which releases the upcoming CPU and invokes an optional arch_sync_cpu_up() function which finalizes the bringup. and on the AP side: CPU comes up, does all the low level setup, sets online, calls complete() and the spinwaits for release. Once the control CPU comes out of the completion it releases the spinwait. That works for all bringup situations and not only for kexec and the simple trick is that by the time the last CPU has been kicked in the first step, the first kicked CPU is already spinwaiting for release. By the time the first kicked CPU has completed the process, i.e. reached the active state, then the next CPU is spinwaiting and so on. If you look at the provided time saving: Mainline: 210ms Patched: 80ms ----------------------------- Delta 130ms i.e. it takes ~ 1.8ms to kick and wait for the AP to come up and ~ 1.1ms per CPU for the whole bringup. It does not completly add up, but it has a clear benefit for everything. Also the changelog says that the delay is related to CPUs in deep C-states. If CPUs are brought down for kexec then it's trivial enough to limit the C-states or just not use mwait() at all. It would be interesting to see the numbers just with play_dead() using hlt() or mwait(eax=0, 0) for the kexec case and no other change at all. Thanks, tglx