Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753363AbbLCT4m (ORCPT ); Thu, 3 Dec 2015 14:56:42 -0500 Received: from g2t4618.austin.hp.com ([15.73.212.83]:55920 "EHLO g2t4618.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753264AbbLCT4l (ORCPT ); Thu, 3 Dec 2015 14:56:41 -0500 Message-ID: <56609E75.4080407@hpe.com> Date: Thu, 03 Dec 2015 14:56:37 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Peter Zijlstra CC: Ingo Molnar , linux-kernel@vger.kernel.org, Yuyang Du , Paul Turner , Ben Segall , Morten Rasmussen , Scott J Norton , Douglas Hatch Subject: Re: [PATCH v2 2/3] sched/fair: Move hot load_avg into its own cacheline References: <1449081710-20185-1-git-send-email-Waiman.Long@hpe.com> <1449081710-20185-3-git-send-email-Waiman.Long@hpe.com> <20151203111209.GX3816@twins.programming.kicks-ass.net> In-Reply-To: <20151203111209.GX3816@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4066 Lines: 98 On 12/03/2015 06:12 AM, Peter Zijlstra wrote: > > I made this: > > --- > Subject: sched/fair: Move hot load_avg into its own cacheline > From: Waiman Long > Date: Wed, 2 Dec 2015 13:41:49 -0500 > > If a system with large number of sockets was driven to full > utilization, it was found that the clock tick handling occupied a > rather significant proportion of CPU time when fair group scheduling > and autogroup were enabled. > > Running a java benchmark on a 16-socket IvyBridge-EX system, the perf > profile looked like: > > 10.52% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt > 9.66% 0.05% java [kernel.vmlinux] [k] hrtimer_interrupt > 8.65% 0.03% java [kernel.vmlinux] [k] tick_sched_timer > 8.56% 0.00% java [kernel.vmlinux] [k] update_process_times > 8.07% 0.03% java [kernel.vmlinux] [k] scheduler_tick > 6.91% 1.78% java [kernel.vmlinux] [k] task_tick_fair > 5.24% 5.04% java [kernel.vmlinux] [k] update_cfs_shares > > In particular, the high CPU time consumed by update_cfs_shares() > was mostly due to contention on the cacheline that contained the > task_group's load_avg statistical counter. This cacheline may also > contains variables like shares, cfs_rq& se which are accessed rather > frequently during clock tick processing. > > This patch moves the load_avg variable into another cacheline > separated from the other frequently accessed variables. It also > creates a cacheline aligned kmemcache for task_group to make sure > that all the allocated task_group's are cacheline aligned. > > By doing so, the perf profile became: > > 9.44% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt > 8.74% 0.01% java [kernel.vmlinux] [k] hrtimer_interrupt > 7.83% 0.03% java [kernel.vmlinux] [k] tick_sched_timer > 7.74% 0.00% java [kernel.vmlinux] [k] update_process_times > 7.27% 0.03% java [kernel.vmlinux] [k] scheduler_tick > 5.94% 1.74% java [kernel.vmlinux] [k] task_tick_fair > 4.15% 3.92% java [kernel.vmlinux] [k] update_cfs_shares > > The %cpu time is still pretty high, but it is better than before. The > benchmark results before and after the patch was as follows: > > Before patch - Max-jOPs: 907533 Critical-jOps: 134877 > After patch - Max-jOPs: 916011 Critical-jOps: 142366 > > Cc: Scott J Norton > Cc: Douglas Hatch > Cc: Ingo Molnar > Cc: Yuyang Du > Cc: Paul Turner > Cc: Ben Segall > Cc: Morten Rasmussen > Signed-off-by: Waiman Long > Signed-off-by: Peter Zijlstra (Intel) > Link: http://lkml.kernel.org/r/1449081710-20185-3-git-send-email-Waiman.Long@hpe.com > --- > kernel/sched/core.c | 10 +++++++--- > kernel/sched/sched.h | 7 ++++++- > 2 files changed, 13 insertions(+), 4 deletions(-) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7345,6 +7345,9 @@ int in_sched_functions(unsigned long add > */ > struct task_group root_task_group; > LIST_HEAD(task_groups); > + > +/* Cacheline aligned slab cache for task_group */ > +static struct kmem_cache *task_group_cache __read_mostly; > #endif > > DECLARE_PER_CPU(cpumask_var_t, load_balance_mask); > @@ -7402,11 +7405,12 @@ void __init sched_init(void) > #endif /* CONFIG_RT_GROUP_SCHED */ > > #ifdef CONFIG_CGROUP_SCHED > + task_group_cache = KMEM_CACHE(task_group, 0); > + Thanks for making that change. Do we need to add the flag SLAB_HWCACHE_ALIGN? Or we could make a helper flag that define SLAB_HWCACHE_ALIGN if CONFIG_FAIR_GROUP_SCHED is defined. Other than that, I am fine with the change. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/