Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755297AbYGMRro (ORCPT ); Sun, 13 Jul 2008 13:47:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753552AbYGMRrg (ORCPT ); Sun, 13 Jul 2008 13:47:36 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:57001 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753440AbYGMRrf (ORCPT ); Sun, 13 Jul 2008 13:47:35 -0400 Date: Sun, 13 Jul 2008 10:46:59 -0700 (PDT) From: Linus Torvalds To: Dmitry Adamushko cc: Vegard Nossum , Paul Menage , Max Krasnyansky , Paul Jackson , Peter Zijlstra , miaox@cn.fujitsu.com, rostedt@goodmis.org, Thomas Gleixner , Ingo Molnar , Linux Kernel Subject: Re: current linux-2.6.git: cpusets completely broken In-Reply-To: Message-ID: References: <20080712031736.GA3040@damson.getinternet.no> <19f34abd0807121600l653e28bfwb5cce2d880b7f2cd@mail.gmail.com> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2221 Lines: 61 On Sun, 13 Jul 2008, Linus Torvalds wrote: > > The thing is, we should fix the top level code to never even _consider_ an > invalid CPU as a target, and that in turn should mean that all the other > code should be able to just totally ignore CPU hotplug events. IOW, I think we should totally remove the whole "update_sched_domains()" thing too. Any logic that needs it is broken. We shouldn't detach the scheduler domains in DOWN_PREPARE (much less UP_PREPARE), we should just leave them damn well alone. As the comment says, "The domains and groups cannot be updated in place without racing with the balancing code". The thing is, we shouldn't even try. The correct way to handle all this is to make the balancing code use the domains regardless, but protect against CPU's going down with _another_ data structure that is much easier to update. Namely something like 'cpu_active_map'. Then we just get rid of all the crap in update_sched_domains() entirely, and then we can make the cpusets code do the *sane* thing, which is to rebuild the scheduler domains only when the CPU up/down has completed. So instead of this illogical and crazy mess: + switch (phase) { + case CPU_UP_CANCELED: + case CPU_UP_CANCELED_FROZEN: + case CPU_DOWN_FAILED: + case CPU_DOWN_FAILED_FROZEN: + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + case CPU_DEAD: + case CPU_DEAD_FROZEN: + common_cpu_mem_hotplug_unplug(1); it should just say + switch (phase) { + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + case CPU_DEAD: + case CPU_DEAD_FROZEN: + common_cpu_mem_hotplug_unplug(1); because it only makes sense to rebuild the scheduler domains when the thing SUCCEEDS. See? By having a sane design, the code is not just more robust and easy to follow, you can also simplify it and make it more logical. The current design is not sane. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/