Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761795AbdDSKIv (ORCPT ); Wed, 19 Apr 2017 06:08:51 -0400 Received: from mail-he1eur01on0072.outbound.protection.outlook.com ([104.47.0.72]:41312 "EHLO EUR01-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751583AbdDSKIs (ORCPT ); Wed, 19 Apr 2017 06:08:48 -0400 From: Noam Camus To: Thomas Gleixner , Peter Zijlstra CC: Vineet Gupta , Chris Metcalf , "linux-kernel@vger.kernel.org" , "anna-maria@linutronix.de" , Eitan Rabin , "Fu, Zhonghui" Subject: RE: Reduce Linux boot time on Large scale system Thread-Topic: Reduce Linux boot time on Large scale system Thread-Index: AdKtYe7annHBTK0DRhy02hYAU5p8dQLf5NsAAAJloQAAAgKLEA== Date: Wed, 19 Apr 2017 10:08:45 +0000 Message-ID: References: <20170419074944.lacscblx2ulhfcd3@hirez.programming.kicks-ass.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linutronix.de; dkim=none (message not signed) header.d=none;linutronix.de; dmarc=none action=none header.from=mellanox.com; x-originating-ip: [193.47.165.251] x-microsoft-exchange-diagnostics: 1;DB5PR05MB1733;7:n8UFd7Utkx95foDMJHTj7UNceECTCfCTrG+Nm0kPJfPuysyN9K/j221UhnCNiNeAinxK7hoHXeBmCDK0KDE/WgMgU0vFg24gTuNd+c868fxMjOPSY3bZogiGQmTGdBrlvnWejqIm2Ea7iy198jv4MzN8maLqvnquDQfTXgEtDh2oQdS7ZRpZEjA0DMVf3pEu7puPjK37tI9no2xBZmdU0oUu5lcyIC+tO5WyWkX+Zvj214VsegY4hz7+XllwNhNbA8t7M0Y7r1s+rQoubactP07noiI9+k90xnt3pLxRPfKSgU0lomX5c75+ziWZLLQohDdDcxx/vga5OL4D/9m6GA== x-ms-office365-filtering-correlation-id: c3d91225-6fda-47a2-9e65-08d4870c1099 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:DB5PR05MB1733; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(209352067349851); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041248)(20161123564025)(20161123562025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(20161123555025)(6072148);SRVR:DB5PR05MB1733;BCL:0;PCL:0;RULEID:;SRVR:DB5PR05MB1733; x-forefront-prvs: 028256169F x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39410400002)(39840400002)(39450400003)(39850400002)(39400400002)(39860400002)(377454003)(24454002)(50986999)(76176999)(38730400002)(54356999)(2900100001)(33656002)(305945005)(5660300001)(6436002)(25786009)(54906002)(74316002)(4326008)(53936002)(229853002)(55016002)(102836003)(99286003)(3846002)(6506006)(2950100002)(9686003)(6116002)(8936002)(8676002)(3280700002)(3660700001)(86362001)(7696004)(81166006)(189998001)(66066001)(7736002)(2906002)(5250100002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB5PR05MB1733;H:DB5PR05MB1638.eurprd05.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Apr 2017 10:08:45.0770 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR05MB1733 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v3JA8v4w031614 Content-Length: 3640 Lines: 88 >From: Thomas Gleixner [mailto:tglx@linutronix.de] >Sent: Wednesday, April 19, 2017 11:58 AM >On Wed, 19 Apr 2017, Peter Zijlstra wrote: >> On Tue, Apr 04, 2017 at 04:39:06PM +0000, Noam Camus wrote: >> > Hi Peter & Vineet >> > >> > I wish to reduce boot time of my platform ARC/plat-eznps (4K CPUs). >> > My analysis is that most boot time is spent over cpu_up() for all >> > CPUs Measurements are about 66mS per CPU and Totally over 4 minutes (I got 800MHz cores). >> > >> > I see that smp_init() just iterate over all present cpus one by one. >> > I wish to know if there was an attempt to optimize this with some parallel work? >> > >> > Are you aware of some method / trick that will help me to reduce boot time? >> > Any suggestion how this can be done? >> >> So attempts have been made in the past but Thomas shot them down for >> being gross hacks (they were). >> >> But Thomas has now (mostly) completed rewriting the CPU hotplug >> machinery and he has at some point outlined means of achieving what >> you're after. >> >> I've added him to Cc so he can correct me where I'm wrong, as I've not >> looked into this in much detail after he mucked up all I knew about >> CPU hotplug. >> >> Since each CPU is now responsible for its own bootstrap, we can now >> kick all the CPUs awake without waiting for them to complete the >> online stage. >> >> There might however be code that assumes CPUs come up one at a time, >> so you'll need to audit for that. Its not going to be a trivial thing. >There are a couple of things to consider. >First of all we should make the whole 'kick CPU into life' and surrounding magic generic. Every arch has it's own handshake mechanism. >That would look like this: >Step BP AP >0-9 [preparatory steps] >10 [kick cpu into life (arch callback)] >11 [Do initial arch bringup then call in into a generic function ] >12 [handshake (generic)] [handshake (generic)] >13 [more arch specific magic] [more arch specific magic] >14-20 [ CPU starting ] [ CPU goes online ] >40 [ CPU active, hotplug done ] >So the first step in parallelizing this would be: > for_each_present_cpu(cpu) > cpu_up(target_state = 10); >i.e. make the allocations and whatever preparatory work needs to be done and kick the CPU into life. The target CPU would intialize the low level stuff and then call into a generic function, which does the generic initialization and then waits for the handshake. >So the next thing would be: > for_each_present_cpu(cpu) > cpu_up(target_state = 40); >This last step has to be single threaded for now because almost all CPU hotplug using facilities rely on the current serialization. There are also code pathes which use get_online_cpus() or cpu_hotplug_disable() to prevent interaction with cpu hotplug. >The hotplug machinery is already designed so that after the handshake (#12/13] a plugged CPU can bring up itself completely alone, but due to the serialization expectations all over the place this won't work today. >To make it work, you have to go through every single instance of CPU hotplug callback users and every single site which prevents hotplug via get_online_cpus() or cpu_hotplug_disable() and audit them for concurrency issues and fix them up. >There might also be interaction required with the state machine, i.e. stop the state progress on a self plugging CPU between two steps to make serialization work. What would be a good base to start on all above? Would some formal release like v4.8 TAG good enough , or do I need to base on some other specific HEAD (or TAG)? Thanks, Noam