Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265493AbUFOPEC (ORCPT ); Tue, 15 Jun 2004 11:04:02 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265681AbUFOPEC (ORCPT ); Tue, 15 Jun 2004 11:04:02 -0400 Received: from e6.ny.us.ibm.com ([32.97.182.106]:59107 "EHLO e6.ny.us.ibm.com") by vger.kernel.org with ESMTP id S265493AbUFOPCl (ORCPT ); Tue, 15 Jun 2004 11:02:41 -0400 Date: Tue, 15 Jun 2004 08:02:15 -0700 From: "Martin J. Bligh" To: Ingo Molnar , Con Kolivas cc: Linux Kernel Mailinglist , Nick Piggin , Andrew Morton , William Lee Irwin III , Linus Torvalds , markw@osdl.org Subject: Re: [PATCH] Performance regression in 2.6.7-rc3 Message-ID: <30410000.1087311734@[10.10.2.4]> In-Reply-To: <20040615045616.GA2006@elte.hu> References: <200406121028.06812.kernel@kolivas.org> <20040615045616.GA2006@elte.hu> X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3164 Lines: 79 Well, I found where at least half my regression on kernel compiles came from. It's sched_balance_context. Which makes no sense to me ... given that it seems to switch on CLONE_VM, and otherwise make no changes. But I went and double-checked my results ... so ... confused. The patch adds find_idlest_cpu and wake_up_forked_thread, but otherwise just seems to do: - if (!(clone_flags & CLONE_STOPPED)) - wake_up_forked_process(p); /* do this last */ - else + if (!(clone_flags & CLONE_STOPPED)) { + /* + * Do the wakeup last. On SMP we treat fork() and + * CLONE_VM separately, because fork() has already + * created cache footprint on this CPU (due to + * copying the pagetables), hence migration would + * probably be costy. Threads on the other hand + * have less traction to the current CPU, and if + * there's an imbalance then the scheduler can + * migrate this fresh thread now, before it + * accumulates a larger cache footprint: + */ + if (clone_flags & CLONE_VM) + wake_up_forked_thread(p); + else + wake_up_forked_process(p); + } else p->state = TASK_STOPPED; ++total_forks; How the hell can that have any effect on non-threaded workloads? Perhaps some part of kernel compile *is* multi-threaded. It does seem to get called somehow ... from the profile: 129 find_idlest_cpu 83 wake_up_forked_thread Here's the diffprofile between the two: 5835 4.0% total 1100 27.1% __copy_from_user_ll 627 3.6% do_anonymous_page 363 6.8% page_add_rmap 357 79.5% strnlen_user 338 43.3% finish_task_switch 308 3080.0% flush_signal_handlers 272 7.0% zap_pte_range 239 2.1% page_remove_rmap 230 16.9% free_hot_cold_page 224 9.7% buffered_rmqueue 196 44.9% pte_alloc_one 171 40.4% copy_process 162 450.0% complete 155 9.2% do_no_page 147 19.5% set_page_dirty 133 17.9% clear_page_tables 131 13.8% do_wp_page 129 0.0% find_idlest_cpu 121 3.4% find_trylock_page ... -113 -7.0% atomic_dec_and_lock -1062 -2.2% default_idle Which looks to me just like worse task affinity. I still think balance on clone is the wrong thing to do by default. On anything but a benchmark, you have more than one process running on the system, and you WANT to keep threads of that process on the same node, not scatter them to the winds. M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/