Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756930Ab1D2PrI (ORCPT ); Fri, 29 Apr 2011 11:47:08 -0400 Received: from mail-qy0-f174.google.com ([209.85.216.174]:58239 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751969Ab1D2PrF convert rfc822-to-8bit (ORCPT ); Fri, 29 Apr 2011 11:47:05 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=A0vNCLF+am7RHRiLEeS+mydYXhOmfeyen36fkb6IdOflGlFH8H+8JlWOoWDqNAAnMN CR1ykyi4cvzK5iHbNUKhJgN61ao5GIeeOOzICBLYRgOEY+Ku3Arvsl1tZbFA8EYc+vPU xyopVnEHJg+ykSO//wNHwq1ePOhH4UT08uuZI= MIME-Version: 1.0 In-Reply-To: <20110428110623.GU4658@suse.de> References: <51e7412097fa62f86656c77c1934e3eb96d5eef6.1303833417.git.minchan.kim@gmail.com> <20110428110623.GU4658@suse.de> Date: Sat, 30 Apr 2011 00:47:03 +0900 Message-ID: Subject: Re: [RFC 6/8] In order putback lru core From: Minchan Kim To: Mel Gorman Cc: Andrew Morton , linux-mm , LKML , Christoph Lameter , Johannes Weiner , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Rik van Riel , Andrea Arcangeli Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12189 Lines: 270 On Thu, Apr 28, 2011 at 8:06 PM, Mel Gorman wrote: > On Wed, Apr 27, 2011 at 01:25:23AM +0900, Minchan Kim wrote: >> This patch defines new APIs to putback the page into previous position of LRU. >> The idea is simple. >> >> When we try to putback the page into lru list and if friends(prev, next) of the pages >> still is nearest neighbor, we can insert isolated page into prev's next instead of >> head of LRU list. So it keeps LRU history without losing the LRU information. >> >> Before : >>       LRU POV : H - P1 - P2 - P3 - P4 -T >> >> Isolate P3 : >>       LRU POV : H - P1 - P2 - P4 - T >> >> Putback P3 : >>       if (P2->next == P4) >>               putback(P3, P2); >>       So, >>       LRU POV : H - P1 - P2 - P3 - P4 -T >> >> For implement, we defines new structure pages_lru which remebers >> both lru friend pages of isolated one and handling functions. >> >> But this approach has a problem on contiguous pages. >> In this case, my idea can not work since friend pages are isolated, too. >> It means prev_page->next == next_page always is false and both pages are not >> LRU any more at that time. It's pointed out by Rik at LSF/MM summit. >> So for solving the problem, I can change the idea. >> I think we don't need both friend(prev, next) pages relation but >> just consider either prev or next page that it is still same LRU. >> Worset case in this approach, prev or next page is free and allocate new >> so it's in head of LRU and our isolated page is located on next of head. >> But it's almost same situation with current problem. So it doesn't make worse >> than now and it would be rare. But in this version, I implement based on idea >> discussed at LSF/MM. If my new idea makes sense, I will change it. >> >> Any comment? >> >> Cc: KOSAKI Motohiro >> Cc: Mel Gorman >> Cc: Rik van Riel >> Cc: Andrea Arcangeli >> Signed-off-by: Minchan Kim >> --- >>  include/linux/migrate.h  |    2 + >>  include/linux/mm_types.h |    7 ++++ >>  include/linux/swap.h     |    4 ++- >>  mm/compaction.c          |    3 +- >>  mm/internal.h            |    2 + >>  mm/memcontrol.c          |    2 +- >>  mm/migrate.c             |   36 +++++++++++++++++++++ >>  mm/swap.c                |    2 +- >>  mm/vmscan.c              |   79 +++++++++++++++++++++++++++++++++++++++++++-- >>  9 files changed, 129 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >> index e39aeec..3aa5ab6 100644 >> --- a/include/linux/migrate.h >> +++ b/include/linux/migrate.h >> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **); >>  #ifdef CONFIG_MIGRATION >>  #define PAGE_MIGRATION 1 >> >> +extern void putback_pages_lru(struct list_head *l); >>  extern void putback_lru_pages(struct list_head *l); >>  extern int migrate_page(struct address_space *, >>                       struct page *, struct page *); >> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, >>  #else >>  #define PAGE_MIGRATION 0 >> >> +static inline void putback_pages_lru(struct list_head *l) {} >>  static inline void putback_lru_pages(struct list_head *l) {} >>  static inline int migrate_pages(struct list_head *l, new_page_t x, >>               unsigned long private, bool offlining, >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index ca01ab2..35e80fb 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -102,6 +102,13 @@ struct page { >>  #endif >>  }; >> >> +/* This structure is used for keeping LRU ordering of isolated page */ >> +struct pages_lru { >> +        struct page *page;      /* isolated page */ >> +        struct page *prev_page; /* previous page of isolate page as LRU order */ >> +        struct page *next_page; /* next page of isolate page as LRU order */ >> +        struct list_head lru; >> +}; >>  /* > > So this thing has to be allocated from somewhere. We can't put it > on the stack as we're already in danger there so we must be using > kmalloc. In the reclaim paths, this should be avoided obviously. > For compaction, we might hurt the compaction success rates if pages > are pinned with control structures. It's something to be wary of. Actually, 32 page unit of migration in compaction should be not a big problem. Maybe, just one page is enough to keep pages_lru arrays but I admit dynamic allocation in reclaim patch and pinning the page in compaction path isn't good. So I have thought pagevec-like approach which is static allocated. But the approach should be assume the migration unit should be small like SWAP_CLUSTER_MAX It's true now but actually I don't want to depends on the size. But for free the size limitation, we have to allocate new page dynamically, Hmm, any ideas? Personally, I don't have any idea other than pagevec-like approach of 32 pages. > > At LSF/MM, I stated a preference for swapping the source and > destination pages in the LRU. This unfortunately means that the LRU I can't understand your point. swapping source and destination? Could you elaborate on it? > now contains a page in the process of being migrated to and the backout > paths for migration failure become a lot more complex. Reclaim should > be ok as it'll should fail to lock the page and recycle it in the list. > This avoids allocations but I freely admit that I'm not in the position > to implement such a thing right now :( > >>   * A region containing a mapping of a non-memory backed file under NOMMU >>   * conditions.  These are held in a global tree and are pinned by the VMAs that >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index baef4ad..4ad0a0c 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page); >>  extern void deactivate_page(struct page *page); >>  extern void swap_setup(void); >> >> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page, >> +                                    int file, int rotated); >>  extern void add_page_to_unevictable_list(struct page *page); >> >>  /** >> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem, >>                                               struct zone *zone, >>                                               unsigned long *nr_scanned); >>  extern int __isolate_lru_page(struct page *page, int mode, int file, >> -                             int not_dirty, int not_mapped); >> +             int not_dirty, int not_mapped, struct pages_lru *pages_lru); >>  extern unsigned long shrink_all_memory(unsigned long nr_pages); >>  extern int vm_swappiness; >>  extern int remove_mapping(struct address_space *mapping, struct page *page); >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 653b02b..c453000 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone, >>               } >> >>               /* Try isolate the page */ >> -             if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0) >> +             if (__isolate_lru_page(page, ISOLATE_BOTH, 0, >> +                                     !cc->sync, 0, NULL) != 0) >>                       continue; >> >>               VM_BUG_ON(PageTransCompound(page)); >> diff --git a/mm/internal.h b/mm/internal.h >> index d071d38..3c8182c 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn; >>   * in mm/vmscan.c: >>   */ >>  extern int isolate_lru_page(struct page *page); >> +extern bool keep_lru_order(struct pages_lru *pages_lru); >> +extern void putback_page_to_lru(struct page *page, struct list_head *head); >>  extern void putback_lru_page(struct page *page); >> >>  /* >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 471e7fd..92a9046 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan, >>                       continue; >> >>               scan++; >> -             ret = __isolate_lru_page(page, mode, file, 0, 0); >> +             ret = __isolate_lru_page(page, mode, file, 0, 0, NULL); >>               switch (ret) { >>               case 0: >>                       list_move(&page->lru, dst); >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 819d233..9cfb63b 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l) >>  } >> >>  /* >> + * This function is almost same iwth putback_lru_pages. >> + * The difference is that function receives struct pages_lru list >> + * and if possible, we add pages into original position of LRU >> + * instead of LRU's head. >> + */ >> +void putback_pages_lru(struct list_head *l) >> +{ >> +        struct pages_lru *isolated_page; >> +        struct pages_lru *isolated_page2; >> +        struct page *page; >> + >> +        list_for_each_entry_safe(isolated_page, isolated_page2, l, lru) { >> +                struct zone *zone; >> +                page = isolated_page->page; >> +                list_del(&isolated_page->lru); >> + >> +                dec_zone_page_state(page, NR_ISOLATED_ANON + >> +                                page_is_file_cache(page)); >> + >> +                zone = page_zone(page); >> +                spin_lock_irq(&zone->lru_lock); >> +                if (keep_lru_order(isolated_page)) { >> +                        putback_page_to_lru(page, &isolated_page->prev_page->lru); >> +                        spin_unlock_irq(&zone->lru_lock); >> +                } >> +                else { >> +                        spin_unlock_irq(&zone->lru_lock); >> +                        putback_lru_page(page); >> +                } >> + >> +                kfree(isolated_page); >> +        } >> +} > > I think we also need counters at least at discussion stage to see > how successful this is. Actually, I had some numbers but don't published it. (4core, 1G DRAM and THP always, big stress(qemu, compile, web browsing, eclipse and other programs fork, successful rate is about 50%) That's because I have another idea to solve contiguous PFN problem. Please, see the description and reply of Rik's question on this thread. I guess the approach could make a high successful rate if there isn't concurrent direct compaction processes. > > For example, early in the system there is a casual relationship > between the age of a page and its location in physical memory. The > buddy allocator gives pages back in PFN order where possible and > there is a loose relationship between when pages get allocated and > when they get reclaimed. As compaction is a linear scanner, there > is a likelihood (that is highly variable) that physically contiguous > pages will have similar positions in the LRU. They'll be isolated at > the same time meaning they also won't be put back in order. Indeed! but I hope my second idea would solve the problem. What do you think about it? > > This might cease to matter when the system is running for some time but > it's a concern. Yes. It depends on aging of workloads. I think it's not easy to get a meaningful number to justify all. :( -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/