Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757012Ab1FFMua (ORCPT ); Mon, 6 Jun 2011 08:50:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:25381 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750964Ab1FFMu1 (ORCPT ); Mon, 6 Jun 2011 08:50:27 -0400 Date: Mon, 6 Jun 2011 14:49:54 +0200 From: Andrea Arcangeli To: Mel Gorman Cc: Minchan Kim , Mel Gorman , akpm@linux-foundation.org, Ury Stankevich , KOSAKI Motohiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Message-ID: <20110606124954.GC12887@random.random> References: <20110531143830.GC13418@barrios-laptop> <20110602182302.GA2802@random.random> <20110602202156.GA23486@barrios-laptop> <20110602214041.GF2802@random.random> <20110602223201.GH2802@random.random> <20110603173707.GL2802@random.random> <20110603180730.GM2802@random.random> <20110606103216.GC5247@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110606103216.GC5247@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3385 Lines: 83 On Mon, Jun 06, 2011 at 11:32:16AM +0100, Mel Gorman wrote: > This patch is pulling in stuff from Minchan. Minimally his patch should > be kept separate to preserve history or his Signed-off should be > included on this patch. Well I didn't apply Minchan's patch, just improved it as he suggested from pseudocode, but I can add his signed-off-by no prob. > I still think this optimisation is rare and only applies if we are > encountering huge pages during the linear scan. How often are we doing > that really? Well it's so fast to do it, that it looks worthwhile. You probably noticed initially I suggested only the fix for page_count (theoretical) oops, and I argued we could improve some more bits, but then it was kind of obvious to improve the upper side of the loop too according to pseudocode. > > > + VM_BUG_ON(!isolated_pages); > > This BUG_ON is overkill. hpage_nr_pages would have to return 0. > > > + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES); > > This would require order > MAX_ORDER_NR_PAGES to be passed into > isolate_lru_pages or for a huge page to be unaligned to a power of > two. The former is very unlikely and the latter is not supported by > any CPU. Minchan also disliked the VM_BUG_ON, it's clearly way overkill, but frankly the pfn physical scans are tricky enough things and if there's a race and the order is wrong for whatever reason (no compound page or overwritten by driver messing with subpages) we'll just trip into some weird pointer next iteration (or maybe not and it'll go ahead unnoticed if it's not beyond the range) and in that case I'd like to notice immediately. But probably it's too paranoid even of a VM_BUG_ON so I surely can remove it... > > > } else { > > - /* the page is freed already. */ > > - if (!page_count(cursor_page)) > > + /* > > + * Check if the page is freed already. > > + * > > + * We can't use page_count() as that > > + * requires compound_head and we don't > > + * have a pin on the page here. If a > > + * page is tail, we may or may not > > + * have isolated the head, so assume > > + * it's not free, it'd be tricky to > > + * track the head status without a > > + * page pin. > > + */ > > + if (!PageTail(cursor_page) && > > + !__page_count(cursor_page)) > > continue; > > break; > > Ack to this part. This is also the only important part that fixes the potential oops. > I'm not keen on __page_count() as __ normally means the "unlocked" > version of a function although I realise that rule isn't universal > either. I can't think of a better name though. If better suggestions comes to mind I can change it... Or I can also use atomic_read like in the first patch... it's up to you. I figured it wasn't so nice to call atomic_read and there are other places in huge_memory.c that used that for bugchecks and it can be cleaned up with __page_count. The _count having _ prefix is the thing that makes it look like a more private field not to use in generic VM code so the raw value can be altered without changing all callers of __page_count similar to _mapcount. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/