Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756372AbZJSOQf (ORCPT ); Mon, 19 Oct 2009 10:16:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756195AbZJSOQe (ORCPT ); Mon, 19 Oct 2009 10:16:34 -0400 Received: from james.oetiker.ch ([213.144.138.195]:47721 "EHLO james.oetiker.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755267AbZJSOQd (ORCPT ); Mon, 19 Oct 2009 10:16:33 -0400 Date: Mon, 19 Oct 2009 16:16:36 +0200 (CEST) From: Tobias Oetiker To: Mel Gorman cc: Frans Pop , Pekka Enberg , David Rientjes , KOSAKI Motohiro , "Rafael J. Wysocki" , Linux Kernel Mailing List , Reinette Chatre , Bartlomiej Zolnierkiewicz , Karol Lewandowski , Mohamed Abbas , "John W. Linville" , linux-mm@kvack.org, jens.axboe@oracle.com Subject: Re: [Bug #14141] order 2 page allocation failures (generic) In-Reply-To: <20091019140957.GE9036@csn.ul.ie> Message-ID: References: <3onW63eFtRF.A.xXH.oMTxKB@chimera> <200910190133.33183.elendil@planet.nl> <1255912562.6824.9.camel@penberg-laptop> <200910190444.55867.elendil@planet.nl> <20091019133146.GB9036@csn.ul.ie> <20091019140957.GE9036@csn.ul.ie> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4724 Lines: 95 Hi Mel, Today Mel Gorman wrote: > On Mon, Oct 19, 2009 at 03:40:05PM +0200, Tobias Oetiker wrote: > > Hi Mel, > > > > Today Mel Gorman wrote: > > > > > On Mon, Oct 19, 2009 at 11:49:08AM +0200, Tobi Oetiker wrote: > > > > Today Frans Pop wrote: > > > > > > > > > > > > > > I'm starting to think that this commit may not be directly related to high > > > > > order allocation failures. The fact that I'm seeing SKB allocation > > > > > failures earlier because of this commit could be just a side effect. > > > > > It could be that instead the main impact of this commit is on encrypted > > > > > file system and/or encrypted swap (kcryptd). > > > > > > > > > > Besides mm the commit also touches dm-crypt (and nfs/write.c, but as I'm > > > > > only reading from NFS that's unlikely). > > > > > > > > I have updated a fileserver to 2.6.31 today and I see page > > > > allocation failures from several parts of the system ... mostly nfs though ... (it is a nfs server). > > > > So I guess the problem must be quite generic: > > > > > > > > > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > > > Oct 19 07:10:02 johan kernel: [23565.684124] [] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > > > > > > > > > What's the rest of the stack trace? I'm wondering where a large number > > > of order-5 GFP_ATOMIC allocations are coming from. It seems different to > > > the e100 problem where there is one GFP_ATOMIC allocation while the > > > firmware is being loaded. > > > > Oct 19 07:10:02 johan kernel: [23565.684110] swapper: page allocation failure. order:5, mode:0x4020 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684118] Pid: 0, comm: swapper Not tainted 2.6.31-02063104-generic #02063104 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684121] Call Trace: [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684124] [] __alloc_pages_slowpath+0x3b2/0x4c0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684157] [] __alloc_pages_nodemask+0x135/0x140 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684164] [] ? _spin_unlock_bh+0x14/0x20 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684170] [] kmalloc_large_node+0x68/0xc0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684175] [] __kmalloc_node_track_caller+0x11a/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684181] [] ? skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684185] [] __alloc_skb+0x76/0x180 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684205] [] skb_copy+0x32/0xa0 [kern.warning] > > Oct 19 07:10:02 johan kernel: [23565.684221] [] vboxNetFltLinuxPacketHandler+0x5c/0xd0 [vboxnetflt] [kern.warning] > > Is the MTU set very high between the host and virtualised machine? > > Can you test please with the patch at http://lkml.org/lkml/2009/10/16/89 > applied and with commits 373c0a7e and 8aa7e847 reverted please? if you can send me a consolidated patch which does apply to 2.6.31.4 I will be glad to try ... your patch in http://lkml.org/lkml/2009/10/16/89 seems not to be for 2.6.31 ... I assume it would be but then again I I don't realy understand the code so this is just pattern matching ... --- a/mm/page_alloc.c 2009-10-05 19:12:06.000000000 +0200 +++ b/mm/page_alloc.c 2009-10-19 14:52:15.000000000 +0200 @@ -1763,6 +1763,7 @@ if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; +restart: wake_all_kswapd(order, zonelist, high_zoneidx); /* @@ -1772,7 +1773,6 @@ */ alloc_flags = gfp_to_alloc_flags(gfp_mask); -restart: /* This is the last chance, in general, before the goto nopage. */ page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS, cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/