Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934523AbbEPJIH (ORCPT ); Sat, 16 May 2015 05:08:07 -0400 Received: from mail-la0-f47.google.com ([209.85.215.47]:35239 "EHLO mail-la0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753552AbbEPJIA convert rfc822-to-8bit (ORCPT ); Sat, 16 May 2015 05:08:00 -0400 MIME-Version: 1.0 In-Reply-To: <20150514175729.GA19960@gmail.com> References: <1430732554-7294-1-git-send-email-jschoenh@amazon.de> <20150506082759.GA30019@gmail.com> <20150507102351.GA14347@gmail.com> <5554B06E.8070607@amazon.de> <20150514175729.GA19960@gmail.com> Date: Sat, 16 May 2015 05:07:59 -0400 X-Google-Sender-Auth: tJKtdNlVgXOKFF96kkntP0DH_XY Message-ID: Subject: Re: [PATCH] x86: skip delays during SMP initialization similar to Xen From: Len Brown To: Ingo Molnar Cc: =?UTF-8?Q?Jan_H=2E_Sch=C3=B6nherr?= , Thomas Gleixner , X86 ML , "linux-kernel@vger.kernel.org" , Anthony Liguori , Ingo Molnar , "H. Peter Anvin" , Tim Deegan , Gang Wei , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4613 Lines: 115 On Thu, May 14, 2015 at 1:57 PM, Ingo Molnar wrote: > > * "Jan H. Schönherr" wrote: > >> Ingo, do you want an updated version of the original patch, which >> takes care not get stuck, when the INIT deassertion is skipped, or >> do you prefer to address delays "one by one" as you wrote elsewhere? > > So I'm not against improving this code at all, but instead of this > hard to follow mixing of old and new code, I'd find the following > approach cleaner and more acceptable: create a 'modern' and a 'legacy' > SMP-bootup variant function, and do a clean separation based on the > CPU model cutoff condition used by Len's patches: > > /* if modern processor, use no delay */ > if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) || > ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF))) > init_udelay = 0; > > Then in the modern variant we can become even more aggressive and > remove these kinds of delays as well: Not sure it is worth two versions, since this is not where the big time is spent. See below. > > udelay(300); FWIW, MPS 1.4 suggests this should be 200, not 300. > udelay(200); > > plus I'd suggest making these poll loops in smpboot.c loops narrower: > > udelay(100); FWIW, on my dekstop, this one executed 17 times (1700usec) This is the time for the remote CPU to wake and get to cpu_init(). Why is it a benefit to have any udelay() before invoking schedule()? > udelay(100); This one didn't execute at all. Indeed, I don't understand why it exists, per question above. /* * Wait till AP completes initial initialization */ while (!cpumask_test_cpu(cpu, cpu_callin_mask)) { /* * Allow other tasks to run while we wait for the * AP to come online. This also gives a chance * for the MTRR work(triggered by the AP coming online) * to be completed in the stop machine context. */ udelay(100); schedule(); } So, the latest TIP has the INIT udelay(10,000) removed, but cpu_up() still takes nearly 19,000 usec on a HSW dekstop. A quick scan of the ftrace shows some high runners: 18949.45 us cpu_up() 2450.580 us notifier_call_chain 102.751 us thermal_throttle_cpu_callback 289.313 us dpm_sysfs_add 1019.594 us msr_class_cpu_callback ... 8455.462 us native_cpu_up() 500.000 us = udelay(300) + udelay(200) Startup IPI 500.000 us = udelay(300) + udelay(200) Startup IPI 1700.000 us = 17 x udelay(100) waiting for AP in initialized_map 2004.172 us check_tsc_warp() 7977.799 us cpu_notify() 1588.108 us cpuset_cpu_active 3043.955 us cacheinfo_cpu_callback 1146.234 us mce_cpu_callback 541.105 us cpufreq_cpu_callback 213.685 us coretemp_cpu_callback cacheinfo_cpu_callback() time appears to be spent creating a bunch of sysfs nodes, which is apparetly an expensive operation. check_tsc_warp() is hard-coded to take 2ms. I don't know if 2ms is a magic number or if shorter has same value. It seems a bit sad to do this serially for every CPU at boot, when we could do all the CPUs in parallel after they are on-line. Perhaps this should be invoked only for boot-time and hot-add time. It shouldn't be needed at all for soft online and resume. Startup IPI delays. MPS 1.4 actually says 200+200, not 300+200, as Linux reads. I don't know where the 300 came from, maybe it was a typo? msr_class_cpu_callback -- making device nodes is not fast. I don't know if anything can be done for the 1700us wait for the remote processor to mark itself initialized. That is the 1st thing it does when it enters cpu_init(). On the xeon, I had see x86_init_rdrand() take 781usec -- dunno why that isn't seen on this box. I'll look at that box again next week. cheers, Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/