Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932522Ab3DKMsB (ORCPT ); Thu, 11 Apr 2013 08:48:01 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:55333 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751577Ab3DKMsA (ORCPT ); Thu, 11 Apr 2013 08:48:00 -0400 Message-ID: <5166B05E.8010904@linux.vnet.ibm.com> Date: Thu, 11 Apr 2013 18:15:18 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Paul Mackerras CC: Linus Torvalds , Ingo Molnar , Robin Holt , "H. Peter Anvin" , Andrew Morton , Linux Kernel Mailing List , Russ Anderson , Shawn Guo , Thomas Gleixner , Ingo Molnar , the arch/x86 maintainers , "Paul E. McKenney" , Tejun Heo , Oleg Nesterov , Lai Jiangshan , Michel Lespinasse , "rusty@rustcorp.com.au" , Peter Zijlstra Subject: Bulk CPU Hotplug (Was Re: [PATCH] Do not force shutdown/reboot to boot cpu.) References: <20130403193743.GB29151@sgi.com> <20130408155701.GB19974@gmail.com> <5162EC1A.4050204@zytor.com> <20130408165916.GA3672@sgi.com> <20130410111620.GB29752@gmail.com> <20130411053106.GA9042@drongo> In-Reply-To: <20130411053106.GA9042@drongo> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13041112-3864-0000-0000-000007ABDF68 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4247 Lines: 100 On 04/11/2013 11:01 AM, Paul Mackerras wrote: > On Wed, Apr 10, 2013 at 08:10:05AM -0700, Linus Torvalds wrote: >> The optimal solution would be to just speed up the >> disable_nonboot_cpus() code so much that it isn't an issue. That would >> be good for suspending too, although I guess suspend isn't a big issue >> if you have a thousand CPU's. >> >> Has anybody checked whether we could do the cpu_down() on non-boot >> CPU's in parallel? Right now we serialize the thing completely, with > > I thought Srivatsa S. Bhat had a patchset that did exactly that. > Srivatsa? > Thanks for the CC, Paul! Adding some more people to CC. Actually, my patchset was about removing stop_machine() from the CPU offline path. http://lwn.net/Articles/538819/ And here is the performance improvement I had measured in the version prior to that: http://article.gmane.org/gmane.linux.kernel/1435249 I'm planning to revive this patchset after the 3.10 merge window closes, because it depends on doing a tree-wide sweep, and I think its a little late to do it in time for the upcoming 3.10 merge window itself. Anyway, that's about removing stop_machine from CPU hotplug. Coming to bulk CPU hotplug, yes, I had ideas similar to what Russ suggested. But I believe we can do more than that. As Russ pointed out, the notifiers are not thread-safe, so calling them in parallel with different CPUs as arguments isn't going to work. So, first, we can convert all the CPU hotplug notifiers to take a cpumask instead of a single CPU. So assuming that there are 'n' notifiers in total, the number of function calls would become n, instead of n*1024. But that itself most likely won't give us much benefit over the for-loop that Russ has done in his patch, because it'll simply do longer processing in each of those 'n' notifiers, by iterating over the cpumask inside each notifier. Now comes the interesting thing: Consider a notifier chain that looks like this: Priority 0: A->B->C->D We can't invoke say notifier callback A simultaneously on 2 CPUs with 2 different hotcpus as argument. *However*, since A, B, C, D all (more or less) belong to different subsystems, we can call A, B, C and D in parallel on different CPUs. They won't even serialize amongst themselves because they take locks (if any) of different subsystems. And since they are of same priority, the ordering (A after B or B after A) doesn't matter as well. So with this, if we combine the idea I wrote above about giving a cpumask to each of these notifiers to work with, we end up in this: CPU 0 CPU 1 CPU2 .... A(cpumask) B(cpumask) C(cpumask) .... So, for example, the CPU_DOWN_PREPARE notification can be processed in parallel on multiple CPUs at a time, for a given cpumask! That should definitely give us a good speed-up. One more thing we have to note is that, there are 4 notifiers for taking a CPU offline: CPU_DOWN_PREPARE CPU_DYING CPU_DEAD CPU_POST_DEAD The first can be run in parallel as mentioned above. The second is run in parallel in the stop_machine() phase as shown in Russ' patch. But the third and fourth set of notifications all end up running only on CPU0, which will again slow down things. So I suggest taking down the 1024 CPUs in multiple phases, like a binary search. First, take 512 CPUs down, then 256 CPUs, then 128 CPUs etc. So at every bulk CPU hotplug, we have enough online CPUs to handle the notifier load, and that helps speed things up. Moreover, a handful of calls to stop_machine() is OK because, stop_machine() takes progressively lesser and lesser time because lesser CPUs are online on each iteration (and hence it reduces the synchronization overhead of the stop-machine phase). The only downside to this whole idea of running the notifiers of a given priority in parallel, is error handling - if a notifier fails, it would be troublesome to rollback I guess. But if we forget that for a moment, we can give this idea a try! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/