Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753748Ab0HRP5I (ORCPT ); Wed, 18 Aug 2010 11:57:08 -0400 Received: from g5t0007.atlanta.hp.com ([15.192.0.44]:34094 "EHLO g5t0007.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751796Ab0HRP5G (ORCPT ); Wed, 18 Aug 2010 11:57:06 -0400 Subject: Re: Over-eager swapping From: Lee Schermerhorn To: Wu Fengguang Cc: Chris Webb , Minchan Kim , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , KOSAKI Motohiro , Pekka Enberg , Andi Kleen , Christoph Lameter In-Reply-To: <20100818152103.GA11268@localhost> References: <20100803042835.GA17377@localhost> <20100803214945.GA2326@arachsys.com> <20100804022148.GA5922@localhost> <20100804032400.GA14141@localhost> <20100804095811.GC2326@arachsys.com> <20100804114933.GA13527@localhost> <20100804120430.GB23551@arachsys.com> <20100818143801.GA9086@localhost> <20100818144655.GX2370@arachsys.com> <20100818152103.GA11268@localhost> Content-Type: text/plain; charset="UTF-8" Organization: HP/LKTT Date: Wed, 18 Aug 2010 11:57:14 -0400 Message-ID: <1282147034.77481.33.camel@useless.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3059 Lines: 80 On Wed, 2010-08-18 at 23:21 +0800, Wu Fengguang wrote: > Andi, Christoph and Lee: > > This looks like an "unbalanced NUMA memory usage leading to premature > swapping" problem. What is the value of the vm.zone_reclaim_mode sysctl? If it is !0, the system will go into zone reclaim before allocating off-node pages. However, it shouldn't "swap" in this case unless (zone_reclaim_mode & 4) != 0. And even then, zone reclaim should only reclaim file pages, not anon. In theory... Note: zone_reclaim_mode will be enabled by default [= 1] if the SLIT contains any distances > 2.0 [20]. Check SLIT values via 'numactl --hardware'. Lee > > Thanks, > Fengguang > > On Wed, Aug 18, 2010 at 10:46:59PM +0800, Chris Webb wrote: > > Wu Fengguang writes: > > > > > Did you enable any NUMA policy? That could start swapping even if > > > there are lots of free pages in some nodes. > > > > Hi. Thanks for the follow-up. We haven't done any configuration or tuning of > > NUMA behaviour, but NUMA support is definitely compiled into the kernel: > > > > # zgrep NUMA /proc/config.gz > > CONFIG_NUMA_IRQ_DESC=y > > CONFIG_NUMA=y > > CONFIG_K8_NUMA=y > > CONFIG_X86_64_ACPI_NUMA=y > > # CONFIG_NUMA_EMU is not set > > CONFIG_ACPI_NUMA=y > > # grep -i numa /var/log/dmesg.boot > > NUMe: Allocated memnodemap from b000 - 1b540 > > NUMA: Using 20 for the hash shift. > > > > > Are your free pages equally distributed over the nodes? Or limited to > > > some of the nodes? Try this command: > > > > > > grep MemFree /sys/devices/system/node/node*/meminfo > > > > My worst-case machines current have swap completely turned off to make them > > usable for clients, but I have one machine which is about 3GB into swap with > > 8GB of buffers and 3GB free. This shows > > > > # grep MemFree /sys/devices/system/node/node*/meminfo > > /sys/devices/system/node/node0/meminfo:Node 0 MemFree: 954500 kB > > /sys/devices/system/node/node1/meminfo:Node 1 MemFree: 2374528 kB > > > > I could definitely imagine that one of the nodes could have dipped down to > > zero in the past. I'll try enabling swap on one of our machines with the bad > > problem late tonight and repeat the experiment. The node meminfo on this box > > currently looks like > > > > # grep MemFree /sys/devices/system/node/node*/meminfo > > /sys/devices/system/node/node0/meminfo:Node 0 MemFree: 82732 kB > > /sys/devices/system/node/node1/meminfo:Node 1 MemFree: 1723896 kB > > > > Best wishes, > > > > Chris. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/