Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp2151310ybf; Mon, 2 Mar 2020 03:03:00 -0800 (PST) X-Google-Smtp-Source: APXvYqyvNNpigy4VCJ/blzyD/5EBcawAA5OXMfhP+pWJEat5ceFhEeRdmDzdD8n4OtZc9rLhjNjO X-Received: by 2002:aca:1011:: with SMTP id 17mr11641368oiq.72.1583146980694; Mon, 02 Mar 2020 03:03:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583146980; cv=none; d=google.com; s=arc-20160816; b=V9exES4KfxXFi3DA4dXO3IOYM/tYtPIIUMiHl15b5SKiLxFCwJFV7C+n1o1IZroCON AfC/OkwYtnxpScvK1C0oSSD7qz5b7x2RMgZwbGwIygXkvFlU1QAwYTQdwZFitMyV3+r2 kR+wrfH9Xnca+PnJ5NEI4aqfcNJC4W3w/c+j+bECDGk+QmUHLDH2uwGzqxlLO9/oUjtk 2lPyxabl1hh1es0xLPOxfiFXp02kdWFLTJUcOqG+CxdLMbdA5nbzjcllTl5alODneDsA QVRL6DB4NaaqVABCwOpusDQhf1/XnaVggJKIn5sPR/gJO1gZ0k/+w7f+45CiC9BCe731 7fqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=dAquPugOxUM1RSigFfqDN374PpVsbPUU+x+Kd7xZOKM=; b=wHPyRYiAGH1OZrMIjafKAK2tmPSEp7+3ePhqlc+adRJ9yrrZHThIzSxFhv5XjNIw3W zwAxeez3Bxgk+0yxXmYmofp1HefOn7z5FcmkvlJiQfi7AMv5t4w+7EwD7cTDp/PluNGl Io8B8YabTwdEwbfGhEmPm/OvTiqcOWblTRs1dZ0ODB9GhKz0ewwHg2u6IOxbsDqc1qDu dkhY4L5CN8uvsWuehzqACAC3r4LEMtHeSZlLlMdSsxhoHGElHf2NPR5lH7ZJkKXxIcw4 48cMOuuRYUeyI9Gg8oFjohu/NEQPpINwe+jgiR2GmLLkhLI6TFap5XykCldNaH6im7LF RBHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x186si5842374oig.209.2020.03.02.03.02.37; Mon, 02 Mar 2020 03:03:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727984AbgCBLCD (ORCPT + 99 others); Mon, 2 Mar 2020 06:02:03 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:57682 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727966AbgCBLCB (ORCPT ); Mon, 2 Mar 2020 06:02:01 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0TrR9JVi_1583146859; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0TrR9JVi_1583146859) by smtp.aliyun-inc.com(127.0.0.1); Mon, 02 Mar 2020 19:00:59 +0800 From: Alex Shi To: cgroups@vger.kernel.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com Cc: Alex Shi , Michal Hocko , Vladimir Davydov , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v9 13/20] mm/lru: replace pgdat lru_lock with lruvec lock Date: Mon, 2 Mar 2020 19:00:23 +0800 Message-Id: <1583146830-169516-14-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1583146830-169516-1-git-send-email-alex.shi@linux.alibaba.com> References: <1583146830-169516-1-git-send-email-alex.shi@linux.alibaba.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch moves lru_lock into lruvec, give a lru_lock for each of lruvec, thus bring a lru_lock for each of memcg per node. So on a large node machine, each of memcg don't need suffer from per node pgdat->lru_lock waiting. They could go fast with their self lru_lock. We introduces function lock_page_lruvec, which will lock the page's memcg's lruvec->lru_lock. (Thanks Johannes Weiner, Hugh Dickins and Konstantin Khlebnikov suggestion/reminder during patch writting.) This is the key patch to replace per node lru_lock with per memcg lruvec lock, with few palce full back more frequency lru locking. But the later patch will use relock_page_lruvec_xxx to fill up the gap. According to Daniel Jordan's suggestion, I run 208 'dd' with on 104 containers on a 2s * 26cores * HT box with a modefied case: https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice With this and later patches, the readtwice performance increases about 80% within concurrent containers. Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Yang Shi Cc: Matthew Wilcox Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org --- include/linux/memcontrol.h | 35 ++++++++++++++++++++++ include/linux/mmzone.h | 2 ++ mm/compaction.c | 51 ++++++++++++++++++++------------ mm/huge_memory.c | 9 ++---- mm/memcontrol.c | 50 +++++++++++++++++++++++++------- mm/mlock.c | 18 +++++------- mm/mmzone.c | 1 + mm/swap.c | 72 +++++++++++++++++----------------------------- mm/vmscan.c | 57 ++++++++++++++++-------------------- 9 files changed, 170 insertions(+), 125 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a7a0a1a5c8d5..b8a04f0a2ab8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -424,6 +424,13 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); +struct lruvec *lock_page_lruvec_irq(struct page *page); +struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flags); + +void unlock_page_lruvec_irq(struct lruvec *lruvec); +void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, unsigned long flags); + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -926,6 +933,34 @@ static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } +static inline struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irq(&pgdat->__lruvec.lru_lock); + return &pgdat->__lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flagsp) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); + return &pgdat->__lruvec; +} + +static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +{ + spin_unlock_irq(&lruvec->lru_lock); +} + +static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, + unsigned long flags) +{ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); +} + static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 462f6873905a..a7bdff94990d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -308,6 +308,8 @@ struct lruvec { atomic_long_t inactive_age; /* Refaults at the time of last reclaim cycle */ unsigned long refaults; + /* per lruvec lru_lock for memcg */ + spinlock_t lru_lock; /* Various lruvec state flags (enum lruvec_flags) */ unsigned long flags; #ifdef CONFIG_MEMCG diff --git a/mm/compaction.c b/mm/compaction.c index 1baba328d089..4a773f0ffedf 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -786,7 +786,7 @@ static bool too_many_isolated(pg_data_t *pgdat) unsigned long nr_scanned = 0, nr_isolated = 0; struct lruvec *lruvec; unsigned long flags = 0; - bool locked = false; + struct lruvec *locked_lruvec = NULL; struct page *page = NULL, *valid_page = NULL; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; @@ -846,11 +846,20 @@ static bool too_many_isolated(pg_data_t *pgdat) * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ - if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(&pgdat->lru_lock, - flags, &locked, cc)) { - low_pfn = 0; - goto fatal_pending; + if (!(low_pfn % SWAP_CLUSTER_MAX)) { + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; + } + + if (fatal_signal_pending(current)) { + cc->contended = true; + + low_pfn = 0; + goto fatal_pending; + } + + cond_resched(); } if (!pfn_valid_within(low_pfn)) @@ -919,10 +928,9 @@ static bool too_many_isolated(pg_data_t *pgdat) */ if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, - flags); - locked = false; + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; } if (!isolate_movable_page(page, isolate_mode)) @@ -965,10 +973,16 @@ static bool too_many_isolated(pg_data_t *pgdat) goto isolate_fail; } + lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* If we already hold the lock, we can skip some rechecking */ - if (!locked) { - locked = compact_lock_irqsave(&pgdat->lru_lock, - &flags, cc); + if (lruvec != locked_lruvec) { + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; + } + compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + locked_lruvec = lruvec; /* Try get exclusive access under lock */ if (!skip_updated) { @@ -988,7 +1002,6 @@ static bool too_many_isolated(pg_data_t *pgdat) } } - lruvec = mem_cgroup_page_lruvec(page, pgdat); VM_BUG_ON_PAGE(PageCompound(page), page); @@ -1025,9 +1038,9 @@ static bool too_many_isolated(pg_data_t *pgdat) * page anyway. */ if (nr_isolated) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; @@ -1052,8 +1065,8 @@ static bool too_many_isolated(pg_data_t *pgdat) low_pfn = end_pfn; isolate_abort: - if (locked) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (locked_lruvec) + unlock_page_lruvec_irqrestore(locked_lruvec, flags); /* * Updated the cached scanner pfn once the pageblock has been scanned diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3835f87d03fd..7af56d0fb044 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2452,7 +2452,7 @@ void lru_add_page_tail(struct page *head, struct page *page_tail, VM_BUG_ON_PAGE(PageCompound(page_tail), head); VM_BUG_ON_PAGE(PageLRU(page_tail), head); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + lockdep_assert_held(&lruvec->lru_lock); if (!list) SetPageLRU(page_tail); @@ -2545,15 +2545,12 @@ static void __split_huge_page(struct page *page, struct list_head *list, pgoff_t end) { struct page *head = compound_head(page); - pg_data_t *pgdat = page_pgdat(head); struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; unsigned long flags; int i; - lruvec = mem_cgroup_page_lruvec(head, pgdat); - /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); @@ -2566,7 +2563,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, } /* Lru list would be changed, don't care head's LRU bit. */ - spin_lock_irqsave(&pgdat->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(head, &flags); for (i = HPAGE_PMD_NR - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); @@ -2585,7 +2582,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, head + i, 0); } } - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); ClearPageCompound(head); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7d7b861a948c..b0c156da60fe 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1219,7 +1219,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd goto out; } - memcg = page->mem_cgroup; + memcg = READ_ONCE(page->mem_cgroup); /* * Swapcache readahead pages are added to the LRU - and * possibly migrated - before they are charged. @@ -1240,6 +1240,37 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd return lruvec; } +/* page must be isolated */ +struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + spin_lock_irq(&lruvec->lru_lock); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) +{ + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + spin_lock_irqsave(&lruvec->lru_lock, *flags); + + return lruvec; +} + +void unlock_page_lruvec_irq(struct lruvec *lruvec) +{ + spin_unlock_irq(&lruvec->lru_lock); +} + +void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, unsigned long flags) +{ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); +} + /** * mem_cgroup_update_lru_size - account for adding or removing an lru page * @lruvec: mem_cgroup per zone lru vector @@ -2576,7 +2607,6 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare) { struct lruvec *lruvec = NULL; - pg_data_t *pgdat; VM_BUG_ON_PAGE(page->mem_cgroup, page); @@ -2585,14 +2615,12 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, * may already be on some other mem_cgroup's LRU. Take care of it. */ if (lrucare) { - pgdat = page_pgdat(page); - spin_lock_irq(&pgdat->lru_lock); - if (TestClearPageLRU(page)) { - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irq(page); + del_page_from_lru_list(page, lruvec, page_lru(page)); - } else - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); + } } /* * Nobody should be changing or seriously looking at @@ -2611,11 +2639,11 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, page->mem_cgroup = memcg; if (lrucare && lruvec) { - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irq(page); VM_BUG_ON_PAGE(PageLRU(page), page); add_page_to_lru_list(page, lruvec, page_lru(page)); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); SetPageLRU(page); } } @@ -2913,7 +2941,7 @@ void __memcg_kmem_uncharge(struct page *page, int order) /* * Because tail pages are not marked as "used", set it. We're under - * pgdat->lru_lock and migration entries setup in all page mappings. + * lruvec->lru_lock and migration entries setup in all page mappings. */ void mem_cgroup_split_huge_fixup(struct page *head) { diff --git a/mm/mlock.c b/mm/mlock.c index a43b3da78541..d285348b147e 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -163,7 +163,7 @@ unsigned int munlock_vma_page(struct page *page) { int nr_pages; bool clearlru = false; - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; /* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); @@ -177,7 +177,7 @@ unsigned int munlock_vma_page(struct page *page) */ get_page(page); clearlru = TestClearPageLRU(page); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); if (!TestClearPageMlocked(page)) { if (clearlru) @@ -186,7 +186,7 @@ unsigned int munlock_vma_page(struct page *page) * Potentially, PTE-mapped THP: do not skip the rest PTEs * Reuse lock as memory barrier for release_pages racing. */ - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); put_page(page); return 0; } @@ -195,14 +195,11 @@ unsigned int munlock_vma_page(struct page *page) __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); if (clearlru) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); del_page_from_lru_list(page, lruvec, page_lru(page)); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); __munlock_isolated_page(page); } else { - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); put_page(page); __munlock_isolation_failed(page); } @@ -289,7 +286,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; struct lruvec *lruvec; @@ -309,8 +305,9 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + lruvec = lock_page_lruvec_irq(page); del_page_from_lru_list(page, lruvec, page_lru(page)); + unlock_page_lruvec_irq(lruvec); continue; /* @@ -323,7 +320,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; } - spin_unlock_irq(&zone->zone_pgdat->lru_lock); __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); /* Now we can release pins of pages that we are not munlocking */ diff --git a/mm/mmzone.c b/mm/mmzone.c index 4686fdc23bb9..3750a90ed4a0 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec) enum lru_list lru; memset(lruvec, 0, sizeof(struct lruvec)); + spin_lock_init(&lruvec->lru_lock); for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); diff --git a/mm/swap.c b/mm/swap.c index 16af7c8369fe..50c856246f84 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -60,14 +60,12 @@ static void __page_cache_release(struct page *page) { if (TestClearPageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; unsigned long flags; - spin_lock_irqsave(&pgdat->lru_lock, flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, &flags); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); } __ClearPageWaiters(page); } @@ -190,33 +188,22 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, void *arg, bool isolation) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); if (isolation && !TestClearPageLRU(page)) continue; - /* every page should be isolated from lru */ - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); - } - - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, &flags); (*move_fn)(page, lruvec, arg); + unlock_page_lruvec_irqrestore(lruvec, flags); if (isolation) SetPageLRU(page); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -328,12 +315,12 @@ static inline void activate_page_drain(int cpu) void activate_page(struct page *page) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; page = compound_head(page); - spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); - spin_unlock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); + __activate_page(page, lruvec, NULL); + unlock_page_lruvec_irq(lruvec); } #endif @@ -783,8 +770,7 @@ void release_pages(struct page **pages, int nr) { int i; LIST_HEAD(pages_to_free); - struct pglist_data *locked_pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long uninitialized_var(flags); unsigned int uninitialized_var(lock_batch); @@ -794,21 +780,20 @@ void release_pages(struct page **pages, int nr) /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the - * same pgdat. The lock is held only if pgdat != NULL. + * same lruvec. The lock is held only if lruvec != NULL. */ - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } if (is_huge_zero_page(page)) continue; if (is_zone_device_page(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } /* * ZONE_DEVICE pages that return 'false' from @@ -827,27 +812,24 @@ void release_pages(struct page **pages, int nr) continue; if (PageCompound(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } __put_compound_page(page); continue; } if (TestClearPageLRU(page)) { - struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); lock_batch = 0; - locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); del_page_from_lru_list(page, lruvec, page_off_lru(page)); } @@ -857,8 +839,8 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); diff --git a/mm/vmscan.c b/mm/vmscan.c index efaa4f41044e..a58cd5ee9ea1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1757,14 +1757,12 @@ int isolate_lru_page(struct page *page) get_page(page); if (TestClearPageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; int lru = page_lru(page); - lruvec = mem_cgroup_page_lruvec(page, pgdat); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); del_page_from_lru_list(page, lruvec, lru); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); ret = 0; } else put_page(page); @@ -1832,7 +1830,6 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { - struct pglist_data *pgdat = lruvec_pgdat(lruvec); int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; @@ -1843,9 +1840,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } @@ -1866,18 +1863,16 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, __ClearPageActive(page); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); (*get_compound_page_dtor(page))(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); lru = page_lru(page); nr_pages = hpage_nr_pages(page); - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_add(&page->lru, &lruvec->lists[lru]); nr_moved += nr_pages; @@ -1938,7 +1933,7 @@ static int current_may_throttle(void) lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); @@ -1950,7 +1945,7 @@ static int current_may_throttle(void) if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); if (nr_taken == 0) return 0; @@ -1958,7 +1953,7 @@ static int current_may_throttle(void) nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) @@ -1971,7 +1966,7 @@ static int current_may_throttle(void) __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); @@ -2024,7 +2019,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2035,7 +2030,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -2081,7 +2076,7 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); /* * Count referenced pages from currently used mappings as rotated, * even though only some of them are actually re-activated. This @@ -2099,7 +2094,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); @@ -2248,7 +2243,6 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat; u64 fraction[2]; u64 denominator = 0; /* gcc */ - struct pglist_data *pgdat = lruvec_pgdat(lruvec); unsigned long anon_prio, file_prio; enum scan_balance scan_balance; unsigned long anon, file; @@ -2326,7 +2320,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) + lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { reclaim_stat->recent_scanned[0] /= 2; reclaim_stat->recent_rotated[0] /= 2; @@ -2347,7 +2341,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); fraction[0] = ap; fraction[1] = fp; @@ -4324,24 +4318,21 @@ int page_evictable(struct page *page) */ void check_move_unevictable_pages(struct pagevec *pvec) { - struct lruvec *lruvec; - struct pglist_data *pgdat = NULL; + struct lruvec *lruvec = NULL; int pgscanned = 0; int pgrescued = 0; int i; for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); pgscanned++; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); - pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); if (!TestClearPageLRU(page) || !PageUnevictable(page)) continue; @@ -4358,10 +4349,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) } } - if (pgdat) { + if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); -- 1.8.3.1