Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754313Ab1DOJ7I (ORCPT ); Fri, 15 Apr 2011 05:59:08 -0400 Received: from gmmr4.centrum.cz ([90.183.38.143]:58317 "EHLO gmmr4.centrum.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751921Ab1DOJ7G (ORCPT ); Fri, 15 Apr 2011 05:59:06 -0400 To: "Mel Gorman" , "Andrew Morton" Subject: Re: Regression from 2.6.36 Date: Fri, 15 Apr 2011 11:59:03 +0200 From: "azurIt" Cc: "Eric Dumazet" , "Changli Gao" , "Am?rico Wang" , "Jiri Slaby" , , , , "Jiri Slaby" References: <1302177428.3357.25.camel@edumazet-laptop> <1302178426.3357.34.camel@edumazet-laptop> <1302190586.3357.45.camel@edumazet-laptop> <20110412154906.70829d60.akpm@linux-foundation.org> <20110412183132.a854bffc.akpm@linux-foundation.org> <1302662256.2811.27.camel@edumazet-laptop> <20110413141600.28793661.akpm@linux-foundation.org> <20110414102501.GE11871@csn.ul.ie> In-Reply-To: <20110414102501.GE11871@csn.ul.ie> X-Mailer: Centrum Email 5.3 X-Priority: 3 MIME-Version: 1.0 Message-Id: <20110415115903.315DEAA1@pobox.sk> X-Maser: Georgo Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6201 Lines: 159 Also this new patch is working fine and fixing the problem. Mel, I cannot run your script: # perl watch-highorder-latency.pl Failed to open /sys/kernel/debug/tracing/set_ftrace_filter for writing at watch-highorder-latency.pl line 17. # ls -ld /sys/kernel/debug/ ls: cannot access /sys/kernel/debug/: No such file or directory azur ______________________________________________________________ > Od: "Mel Gorman" > Komu: Andrew Morton > Dátum: 14.04.2011 12:25 > Predmet: Re: Regression from 2.6.36 > > CC: "Eric Dumazet" , "Changli Gao" , "Am?rico Wang" , "Jiri Slaby" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Jiri Slaby" >On Wed, Apr 13, 2011 at 02:16:00PM -0700, Andrew Morton wrote: >> On Wed, 13 Apr 2011 04:37:36 +0200 >> Eric Dumazet wrote: >> >> > Le mardi 12 avril 2011 __ 18:31 -0700, Andrew Morton a __crit : >> > > On Wed, 13 Apr 2011 09:23:11 +0800 Changli Gao wrote: >> > > >> > > > On Wed, Apr 13, 2011 at 6:49 AM, Andrew Morton >> > > > wrote: >> > > > > >> > > > > It's somewhat unclear (to me) what caused this regression. >> > > > > >> > > > > Is it because the kernel is now doing large kmalloc()s for the fdtable, >> > > > > and this makes the page allocator go nuts trying to satisfy high-order >> > > > > page allocation requests? >> > > > > >> > > > > Is it because the kernel now will usually free the fdtable >> > > > > synchronously within the rcu callback, rather than deferring this to a >> > > > > workqueue? >> > > > > >> > > > > The latter seems unlikely, so I'm thinking this was a case of >> > > > > high-order-allocations-considered-harmful? >> > > > > >> > > > >> > > > Maybe, but I am not sure. Maybe my patch causes too many inner >> > > > fragments. For example, when asking for 5 pages, get 8 pages, and 3 >> > > > pages are wasted, then memory thrash happens finally. >> > > >> > > That theory sounds less likely, but could be tested by using >> > > alloc_pages_exact(). >> > > >> > >> > Very unlikely, since fdtable sizes are powers of two, unless you hit >> > sysctl_nr_open and it was changed (default value being 2^20) >> > >> >> So am I correct in believing that this regression is due to the >> high-order allocations putting excess stress onto page reclaim? >> > >This is very plausible but it would be nice to get confirmation on >what the size of the fdtable was to be sure. If it's big enough for >high-order allocations and it's a fork-heavy workload with memory >mostly in use, the fork() latencies could be getting very high. In >addition, each fork is potentially kicking kswapd awake (to rebalance >the zone for higher orders). I do not see CONFIG_COMPACTION enabled >meaning that if I'm right in that kswapd is awake and fork() is >entering direct reclaim, then we are lumpy reclaiming as well which >can stall pretty severely. > >> If so, then how large _are_ these allocations? This perhaps can be >> determined from /proc/slabinfo. They must be pretty huge, because slub >> likes to do excessively-large allocations and the system handles that >> reasonably well. >> > >I'd be interested in finding out the value of /proc/sys/fs/file-max and >the output of ulimit -n (max open files) for the main server is. This >should help us determine what the size of the fdtable is. > >> I suppose that a suitable fix would be >> >> >> From: Andrew Morton >> >> Azurit reports large increases in system time after 2.6.36 when running >> Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc() >> to allocate fdmem if possible"). >> >> That patch caused the vfs to use kmalloc() for very large allocations and >> this is causing excessive work (and presumably excessive reclaim) within >> the page allocator. >> >> Fix it by falling back to vmalloc() earlier - when the allocation attempt >> would have been considered "costly" by reclaim. >> >> Reported-by: azurIt >> Cc: Changli Gao >> Cc: Americo Wang >> Cc: Jiri Slaby >> Cc: Eric Dumazet >> Cc: Mel Gorman >> Signed-off-by: Andrew Morton >> --- >> >> fs/file.c | 17 ++++++++++------- >> 1 file changed, 10 insertions(+), 7 deletions(-) >> >> diff -puN fs/file.c~a fs/file.c >> --- a/fs/file.c~a >> +++ a/fs/file.c >> @@ -39,14 +39,17 @@ int sysctl_nr_open_max = 1024 * 1024; /* >> */ >> static DEFINE_PER_CPU(struct fdtable_defer, fdtable_defer_list); >> >> -static inline void *alloc_fdmem(unsigned int size) >> +static void *alloc_fdmem(unsigned int size) >> { >> - void *data; >> - >> - data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN); >> - if (data != NULL) >> - return data; >> - >> + /* >> + * Very large allocations can stress page reclaim, so fall back to >> + * vmalloc() if the allocation size will be considered "large" by the VM. >> + */ >> + if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) { > >The reporter will need to retest this is really ok. The patch that was >reported to help avoided high-order allocations entirely. If fork-heavy >workloads are really entering direct reclaim and increasing fork latency >enough to ruin performance, then this patch will also suffer. How much >it helps depends on how big fdtable. > >> + void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN); >> + if (data != NULL) >> + return data; >> + } >> return vmalloc(size); >> } >> > >I'm attaching a primitive perl script that reports high-order allocation >latencies. I'd be interesting to see what the output of it looks like, >particularly when the server is in trouble if the bug reporter as the >time. > >-- >Mel Gorman >SUSE Labs > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/