Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757876AbZASXmR (ORCPT ); Mon, 19 Jan 2009 18:42:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753830AbZASXl7 (ORCPT ); Mon, 19 Jan 2009 18:41:59 -0500 Received: from mx3.mail.elte.hu ([157.181.1.138]:44115 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752526AbZASXl6 (ORCPT ); Mon, 19 Jan 2009 18:41:58 -0500 Date: Tue, 20 Jan 2009 00:40:59 +0100 From: Ingo Molnar To: Steven Rostedt , Mike Travis , Rusty Russell Cc: Chris Mason , "Ma, Chinang" , Andrew Morton , Matthew Wilcox , "Wilcox, Matthew R" , "linux-kernel@vger.kernel.org" , "Tripathi, Sharad C" , "arjan@linux.intel.com" , "Kleen, Andi" , "Siddha, Suresh B" , "Chilukuri, Harita" , "Styner, Douglas W" , "Wang, Peter Xihong" , "Nueckel, Hubert" , "linux-scsi@vger.kernel.org" , Andrew Vasquez , Anirban Chakraborty , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Gregory Haskins , Rusty Russell Subject: Re: Mainline kernel OLTP performance update Message-ID: <20090119234059.GA452@elte.hu> References: <20090114163557.11e097f2.akpm@linux-foundation.org> <20090115012147.GW29283@parisc-linux.org> <20090114180431.f4a96543.akpm@linux-foundation.org> <1231986439.21980.54.camel@localhost.localdomain> <1232388291.6521.140.camel@think.oraclecorp.com> <1232390259.25783.5.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1232390259.25783.5.camel@localhost.localdomain> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3048 Lines: 98 * Steven Rostedt wrote: > (added Rusty) > > On Mon, 2009-01-19 at 13:04 -0500, Chris Mason wrote: > > On Thu, 2009-01-15 at 00:11 -0700, Ma, Chinang wrote: > > > >> > > > > > > > >> > > > > Linux OLTP Performance summary > > > >> > > > > Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle% > > > >iowait% > > > >> > > > > 2.6.24.2 1.000 21969 43425 76 24 0 > > > >0 > > > >> > > > > 2.6.27.2 0.973 30402 43523 74 25 0 > > > >1 > > > >> > > > > 2.6.29-rc1 0.965 30331 41970 74 26 0 > > > >0 > > > >> > > > > >> > > But the interrupt rate went through the roof. > > > >> > > > > >> > Yes. I forget why that was; I'll have to dig through my archives for > > > >> > that. > > > >> > > > >> Oh. I'd have thought that this alone could account for 3.5%. > > > > A later email indicated the reschedule interrupt count doubled since > > 2.6.24, and so I poked around a bit at the causes of resched_task. > > > > I think the -rt version of check_preempt_equal_prio has gotten much more > > expensive since 2.6.24. > > > > I'm sure these changes were made for good reasons, and this workload may > > not be a good reason to change it back. But, what does the patch below > > do to performance on 2.6.29-rcX? > > > > -chris > > > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c > > index 954e1a8..bbe3492 100644 > > --- a/kernel/sched_rt.c > > +++ b/kernel/sched_rt.c > > @@ -842,6 +842,7 @@ static void check_preempt_curr_rt(struct rq *rq, > > struct task_struct *p, int sync > > resched_task(rq->curr); > > return; > > } > > + return; > > > > #ifdef CONFIG_SMP > > /* > > That should not cause much of a problem if the scheduling task is not > pinned to an CPU. But!!!!! > > A recent change makes it expensive: > > commit 24600ce89a819a8f2fb4fd69fd777218a82ade20 > Author: Rusty Russell > Date: Tue Nov 25 02:35:13 2008 +1030 > > sched: convert check_preempt_equal_prio to cpumask_var_t. > > Impact: stack reduction for large NR_CPUS > > > > which has: > > static void check_preempt_equal_prio(struct rq *rq, struct task_struct > *p) > { > - cpumask_t mask; > + cpumask_var_t mask; > > if (rq->curr->rt.nr_cpus_allowed == 1) > return; > > - if (p->rt.nr_cpus_allowed != 1 > - && cpupri_find(&rq->rd->cpupri, p, &mask)) > + if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) > return; > > > > > check_preempt_equal_prio is in a scheduling hot path!!!!! > > WTF are we allocating there for? Agreed - this needs to be fixed. Since this runs under the runqueue lock we can have a temporary cpumask in the runqueue itself, not on the stack. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/