Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755656Ab0KYBSx (ORCPT ); Wed, 24 Nov 2010 20:18:53 -0500 Received: from netnation.com ([204.174.223.2]:37809 "EHLO peace.netnation.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752548Ab0KYBSw (ORCPT ); Wed, 24 Nov 2010 20:18:52 -0500 Date: Wed, 24 Nov 2010 17:18:48 -0800 From: Simon Kirby To: Peter Sch??ller Cc: Pekka Enberg , Dave Hansen , Andrew Morton , linux-kernel@vger.kernel.org, Mattias de Zalenski , linux-mm@kvack.org Subject: Re: Sudden and massive page cache eviction Message-ID: <20101125011848.GB29511@hostway.ca> References: <20101122161158.02699d10.akpm@linux-foundation.org> <1290501502.2390.7029.camel@nimitz> <1290529171.2390.7994.camel@nimitz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3642 Lines: 72 On Wed, Nov 24, 2010 at 04:32:39PM +0100, Peter Sch??ller wrote: > >> I forgot to address the second part of this question: How would I best > >> inspect whether the kernel is doing that? > > > > You can, for example, record > > > > ??cat /proc/meminfo | grep Huge > > > > for large page allocations. > > Those show zero a per my other post. However I got the impression Dave > was asking about regular but larger-than-one-page allocations internal > to the kernel, while the Huge* lines in /proc/meminfo refers to > allocations specifically done by userland applications doing huge page > allocation on a system with huge pages enabled - or am I confused? Your page cache dents don't seem quite as big, so it may be something else, but if it's the same problem we're seeing here, it seems to have to do with when an order=3 new_slab allocation comes in to grows the kmalloc slab cache for an __alloc_skb (network packet). This is normal even without jumbo frames now. When there are no zones with order=3 zone_watermark_ok(), kswapd is woken, which frees things all over the place to try to get zone_watermark_ok(order=3) to be happy. We're seeing this throw out a huge number of pages, and we're seeing it happen even with lots of memory free in the zone. CONFIG_COMPACTION also currently does not help because try_to_compact_pages() returns early with COMPACT_SKIPPED if order <= PAGE_ALLOC_COSTLY_ORDER, and, you guessed it, PAGE_ALLOC_COSTLY_ORDER is set to 3. I reimplemented zone_pages_ok(order=3) in userspace, and I can see it happen: Code here: http://0x.ca/sim/ref/2.6.36/buddyinfo_scroll Zone order:0 1 2 3 4 5 6 7 8 9 A nr_free state DMA32 19026 33652 4897 13 5 1 2 0 0 0 0 106262 337 <= 256 Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 19301 33869 4665 12 5 1 2 0 0 0 0 106035 329 <= 256 Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 19332 33931 4603 9 5 1 2 0 0 0 0 105918 305 <= 256 Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 19467 34057 4468 6 5 1 2 0 0 0 0 105741 281 <= 256 Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 19591 34181 4344 5 5 1 2 0 0 0 0 105609 273 <= 256 Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 19856 34348 4109 2 5 1 2 0 0 0 0 105244 249 <= 256 !!! Normal 450 0 0 0 0 0 0 0 0 0 0 450 -7 <= 238 DMA32 24088 36476 5437 144 5 1 2 0 0 0 0 120180 1385 <= 256 Normal 1024 1 0 0 0 0 0 0 0 0 0 1026 -5 <= 238 DMA32 26453 37440 6676 623 53 1 2 0 0 0 0 134029 5985 <= 256 Normal 8700 100 0 0 0 0 0 0 0 0 0 8900 193 <= 238 DMA32 48881 38161 7142 966 81 1 2 0 0 0 0 162955 9177 <= 256 Normal 8936 102 0 1 0 0 0 0 0 0 0 9148 205 <= 238 DMA32 66046 40051 7871 1409 135 2 2 0 0 0 0 191256 13617 <= 256 Normal 9019 18 0 0 0 0 0 0 0 0 0 9055 29 <= 238 DMA32 67133 48671 8231 1578 143 2 2 0 0 0 0 212503 15097 <= 256 So, kswapd was woken up at the line that ends in "!!!" there, because free_pages(249) <= min(256), and so zone_watermark_ok() returned 0, when an order=3 allocation came in. Maybe try out that script and see if you see something similar. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/