Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759406AbXEKAr6 (ORCPT ); Thu, 10 May 2007 20:47:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755072AbXEKArw (ORCPT ); Thu, 10 May 2007 20:47:52 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:47664 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754275AbXEKArv (ORCPT ); Thu, 10 May 2007 20:47:51 -0400 Date: Fri, 11 May 2007 09:47:46 +0900 From: KAMEZAWA Hiroyuki To: Mel Gorman Cc: y-goto@jp.fujitsu.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@osdl.org, clameter@sgi.com Subject: Re: [RFC] memory hotremove patch take 2 [02/10] (make page unused) Message-Id: <20070511094746.15e5f1c3.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: <20070509115506.B904.Y-GOTO@jp.fujitsu.com> <20070509120248.B908.Y-GOTO@jp.fujitsu.com> Organization: Fujitsu X-Mailer: Sylpheed version 2.2.0 (GTK+ 2.6.10; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6769 Lines: 214 On Thu, 10 May 2007 16:34:01 +0100 (IST) Mel Gorman wrote: > > +#ifdef CONFIG_PAGE_ISOLATION > > + /* > > + * For pages which are not used but not free. > > + * See include/linux/page_isolation.h > > + */ > > + spinlock_t isolation_lock; > > + struct list_head isolation_list; > > +#endif > > Using MIGRATE_ISOLATING instead of this approach does mean that there will > be MAX_ORDER additional struct free_area added to the zone. That is more > lists than this approach. > Thank you!, its an interesting idea. I think it will make our code much simpler. I'll look into. > I am somewhat suprised that CONFIG_PAGE_ISOLATION exists as a separate > option. If it was a compile-time option at all, I would expect it to > depend on memory hot-remove being selected. > I myself think CONFIG_PAGE_ISOLATION can be used by some code which need to isolate some amount of contiguous pages. So config is divided for now. Now, CONFIG_MEMORY_HOTREMOVE selects this. CONFIG_PAGE_ISOLATION and CONFIG_MEMORY_HOTREMOVE will be merged later if there are no one who use this except for hot-removal. > > /* > > * zone_start_pfn, spanned_pages and present_pages are all > > * protected by span_seqlock. It is a seqlock because it has > > Index: current_test/mm/page_alloc.c > > =================================================================== > > --- current_test.orig/mm/page_alloc.c 2007-05-08 15:07:20.000000000 +0900 > > +++ current_test/mm/page_alloc.c 2007-05-08 15:08:34.000000000 +0900 > > @@ -41,6 +41,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -448,6 +449,9 @@ static inline void __free_one_page(struc > > if (unlikely(PageCompound(page))) > > destroy_compound_page(page, order); > > > > + if (page_under_isolation(zone, page, order)) > > + return; > > + > > Using MIGRATE_ISOLATING would avoid a potential list search here. > yes. thank you. > > page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1); > > > > VM_BUG_ON(page_idx & (order_size - 1)); > > @@ -3259,6 +3263,10 @@ static void __meminit free_area_init_cor > > zone->nr_scan_inactive = 0; > > zap_zone_vm_stats(zone); > > atomic_set(&zone->reclaim_in_progress, 0); > > +#ifdef CONFIG_PAGE_ISOLATION > > + spin_lock_init(&zone->isolation_lock); > > + INIT_LIST_HEAD(&zone->isolation_list); > i> +#endif > > if (!size) > > continue; > > > > @@ -4214,3 +4222,182 @@ void set_pageblock_flags_group(struct pa > > else > > __clear_bit(bitidx + start_bitidx, bitmap); > > } > > + > > +#ifdef CONFIG_PAGE_ISOLATION > > +/* > > + * Page Isolation. > > + * > > + * If a page is removed from usual free_list and will never be used, > > + * It is linked to "struct isolation_info" and set Reserved, Private > > + * bit. page->mapping points to isolation_info in it. > > + * and page_count(page) is 0. > > + * > > + * This can be used for creating a chunk of contiguous *unused* memory. > > + * > > + * current user is Memory-Hot-Remove. > > + * maybe move to some other file is better. > > page_isolation.c to match the header filename seems reasonable. > page_alloc.c has a lot of multi-function stuff like memory initialisation > in it. Hmm. > > > + */ > > +static void > > +isolate_page_nolock(struct isolation_info *info, struct page *page, int order) > > +{ > > + int pagenum; > > + pagenum = 1 << order; > > + while (pagenum > 0) { > > + SetPageReserved(page); > > + SetPagePrivate(page); > > + page->private = (unsigned long)info; > > + list_add(&page->lru, &info->pages); > > + page++; > > + pagenum--; > > + } > > +} > > It's worth commenting somewhere that pages on the list in isolation_info > are always order-0. > okay. > > + > > +/* > > + * This function is called from page_under_isolation() > > + */ > > + > > +int __page_under_isolation(struct zone *zone, struct page *page, int order) > > +{ > > + struct isolation_info *info; > > + unsigned long pfn = page_to_pfn(page); > > + unsigned long flags; > > + int found = 0; > > + > > + spin_lock_irqsave(&zone->isolation_lock,flags); > > An unwritten convention seems to be that __ versions of same-named > functions are the nolock version. i.e. I would expect > page_under_isolation() to acquire and release the spinlock and > __page_under_isolation() to do no additional locking. > > Locking outside of here might make the flow a little clearer as well if > you had two returns and avoided the use of "found". > Maybe MOVABLE_ISOLATING will simplify these code. > > + list_for_each_entry(info, &zone->isolation_list, list) { > > + if (info->start_pfn <= pfn && pfn < info->end_pfn) { > > + found = 1; > > + break; > > + } > > + } > > + if (found) { > > + isolate_page_nolock(info, page, order); > > + } > > + spin_unlock_irqrestore(&zone->isolation_lock, flags); > > + return found; > > +} > > + > > +/* > > + * start and end must be in the same zone. > > + * > > + */ > > +struct isolation_info * > > +register_isolation(unsigned long start, unsigned long end) > > +{ > > + struct zone *zone; > > + struct isolation_info *info = NULL, *tmp; > > + unsigned long flags; > > + unsigned long last_pfn = end - 1; > > + > > + if (!pfn_valid(start) || !pfn_valid(last_pfn) || (start >= end)) > > + return ERR_PTR(-EINVAL); > > + /* check start and end is in the same zone */ > > + zone = page_zone(pfn_to_page(start)); > > + > > + if (zone != page_zone(pfn_to_page(last_pfn))) > > + return ERR_PTR(-EINVAL); > > + /* target range has to match MAX_ORDER alignmet */ > > + if ((start & (MAX_ORDER_NR_PAGES - 1)) || > > + (end & (MAX_ORDER_NR_PAGES - 1))) > > + return ERR_PTR(-EINVAL); > > Why does the range have to be MAX_ORDER alighned? > > > + info = kmalloc(sizeof(*info), GFP_KERNEL); > > + if (!info) > > + return ERR_PTR(-ENOMEM); > > + spin_lock_irqsave(&zone->isolation_lock, flags); > > + /* we don't allow overlap among isolation areas */ > > + if (!list_empty(&zone->isolation_list)) { > > + list_for_each_entry(tmp, &zone->isolation_list, list) { > > + if (start < tmp->end_pfn && end > tmp->start_pfn) { > > + goto out_free; > > + } > > + } > > + } > > Why not merge requests for overlapping isolations? This is related to memory-unplug interface. It doesn't allow overlaping. So this is not expected to happen. just sanity check. but this code will be removed by MIGRATE_ISOLATING. Thank you for your good idea. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/