Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759046AbZFII1s (ORCPT ); Tue, 9 Jun 2009 04:27:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755168AbZFII1c (ORCPT ); Tue, 9 Jun 2009 04:27:32 -0400 Received: from gir.skynet.ie ([193.1.99.77]:57953 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752879AbZFII1b (ORCPT ); Tue, 9 Jun 2009 04:27:31 -0400 Date: Tue, 9 Jun 2009 09:27:29 +0100 From: Mel Gorman To: Wu Fengguang Cc: KOSAKI Motohiro , Rik van Riel , Christoph Lameter , "Zhang, Yanmin" , "linuxram@us.ibm.com" , linux-mm , LKML Subject: Re: [PATCH 2/3] Properly account for the number of page cache pages zone_reclaim() can reclaim Message-ID: <20090609082728.GF18380@csn.ul.ie> References: <1244466090-10711-1-git-send-email-mel@csn.ul.ie> <1244466090-10711-3-git-send-email-mel@csn.ul.ie> <20090609022549.GB6740@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20090609022549.GB6740@localhost> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4532 Lines: 106 On Tue, Jun 09, 2009 at 10:25:49AM +0800, Wu Fengguang wrote: > On Mon, Jun 08, 2009 at 09:01:29PM +0800, Mel Gorman wrote: > > On NUMA machines, the administrator can configure zone_relcaim_mode that > > is a more targetted form of direct reclaim. On machines with large NUMA > > distances for example, a zone_reclaim_mode defaults to 1 meaning that clean > > unmapped pages will be reclaimed if the zone watermarks are not being met. > > > > There is a heuristic that determines if the scan is worthwhile but the > > problem is that the heuristic is not being properly applied and is basically > > assuming zone_reclaim_mode is 1 if it is enabled. > > > > This patch makes zone_reclaim() makes a better attempt at working out how > > many pages it might be able to reclaim given the current reclaim_mode. If it > > cannot clean pages, then NR_FILE_DIRTY number of pages are not candidates. If > > it cannot swap, then NR_FILE_MAPPED are not. This indirectly addresses tmpfs > > as those pages tend to be dirty as they are not cleaned by pdflush or sync. > > No, tmpfs pages are not accounted in NR_FILE_DIRTY because of the > BDI_CAP_NO_ACCT_AND_WRITEBACK bits. > Ok, that explains why the dirty page count was not as high as I was expecting. Thanks. > > The ideal would be that the number of tmpfs pages would also be known > > and account for like NR_FILE_MAPPED as swap is required to discard them. > > A means of working this out quickly was not obvious but a comment is added > > noting the problem. > > I'd rather prefer it be accounted separately than to muck up NR_FILE_MAPPED :) > Maybe I used a poor choice of words. What I meant was that the ideal would be we had a separate count for tmpfs pages. As tmpfs pages and mapped pages both have to be unmapped and potentially, they are "like" each other with respect to the zone_reclaim_mode and how it behaves. We would end up with something like pagecache_reclaimable -= zone_page_state(zone, NR_FILE_MAPPED); pagecache_reclaimable -= zone_page_state(zone, NR_FILE_TMPFS); > > + int pagecache_reclaimable; > > + > > + /* > > + * Work out how many page cache pages we can reclaim in this mode. > > + * > > + * NOTE: Ideally, tmpfs pages would be accounted as if they were > > + * NR_FILE_MAPPED as swap is required to discard those > > + * pages even when they are clean. However, there is no > > + * way of quickly identifying the number of tmpfs pages > > + */ > > So can you remove the note on NR_FILE_MAPPED? > Why would I remove the note? I can alter the wording but the intention is to show we cannot count the number of tmpfs pages quickly and it would be nice if we could. Maybe this is clearer? Note: Ideally tmpfs pages would be accounted for as NR_FILE_TMPFS or similar and treated similar to NR_FILE_MAPPED as both require unmapping from page tables and potentially swap to reclaim. However, no such counter exists. > > + pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES); > > + if (!(zone_reclaim_mode & RECLAIM_WRITE)) > > + pagecache_reclaimable -= zone_page_state(zone, NR_FILE_DIRTY); > > > + if (!(zone_reclaim_mode & RECLAIM_SWAP)) > > + pagecache_reclaimable -= zone_page_state(zone, NR_FILE_MAPPED); > > So the "if" can be removed because NR_FILE_MAPPED is not related to swapping? > It's partially related with respect to what zone_reclaim() is doing. Once something is mapped, we need RECLAIM_SWAP set on the zone_reclaim_mode to do anything useful with them. > Thanks, > Fengguang > > > /* > > * Zone reclaim reclaims unmapped file backed pages and > > @@ -2391,8 +2406,7 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) > > * if less than a specified percentage of the zone is used by > > * unmapped file backed pages. > > */ > > - if (zone_page_state(zone, NR_FILE_PAGES) - > > - zone_page_state(zone, NR_FILE_MAPPED) <= zone->min_unmapped_pages > > + if (pagecache_reclaimable <= zone->min_unmapped_pages > > && zone_page_state(zone, NR_SLAB_RECLAIMABLE) > > <= zone->min_slab_pages) > > return 0; > > -- > > 1.5.6.5 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/