2019-12-10 11:49:52

by Alex Shi

[permalink] [raw]
Subject: [PATCH v5 0/8] per lruvec lru_lock for memcg

Hi all,

Sorry for send out later.

This patchset move lru_lock into lruvec, give a lru_lock for each of
lruvec, thus bring a lru_lock for each of memcg per node.

This is the main patch to replace per node lru_lock with per memcg
lruvec lock.

We introduces function lock_page_lruvec, which will lock the page's
memcg and then memcg's lruvec->lru_lock. (Thanks Johannes Weiner,
Hugh Dickins and Konstantin Khlebnikov suggestion/reminder)

According to Daniel Jordan's suggestion, I run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this and later patches, the readtwice performance increases about
80% with containers, but w/o memcg the readtwice performance drops
about 5%.(and another 5% drops with the last debug patch). Slighty
better than v4.(about 6% drop w/o memcg)

Considering the memcg move task path:
mem_cgroup_move_task:
mem_cgroup_move_charge:
lru_add_drain_all();
atomic_inc(&mc.from->moving_account); //ask lruvec's move_lock
synchronize_rcu();
walk_parge_range: do charge_walk_ops(mem_cgroup_move_charge_pte_range):
isolate_lru_page();
mem_cgroup_move_account(page,)
spin_lock(&from->move_lock)
page->mem_cgroup = to;
spin_unlock(&from->move_lock)
putback_lru_page(page)

to guard 'page->mem_cgroup = to' by to_vec->lru_lock has the similar effect with
move_lock. So for performance reason, both solutions are same.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought the same idea
7 years ago.

Thanks all the comments from Hugh Dickins, Konstantin Khlebnikov, Daniel Jordan,
Johannes Weiner, Mel Gorman, Shakeel Butt, Rong Chen, Fengguang Wu, Yun Wang etc.
and some testing support from Intel 0days!

v5,
a, locking page's memcg according JohannesW suggestion
b, using macro for non memcg, according to Johanness and Metthew's suggestion.

v4:
a, fix the page->mem_cgroup dereferencing issue, thanks Johannes Weiner
b, remove the irqsave flags changes, thanks Metthew Wilcox
c, merge/split patches for better understanding and bisection purpose

v3: rebase on linux-next, and fold the relock fix patch into introduceing patch

v2: bypass a performance regression bug and fix some function issues

v1: initial version, aim testing show 5% performance increase


Alex Shi (7):
mm/vmscan: remove unnecessary lruvec adding
mm/lru: replace pgdat lru_lock with lruvec lock
mm/lru: introduce the relock_page_lruvec function
mm/mlock: optimize munlock_pagevec by relocking
mm/swap: only change the lru_lock iff page's lruvec is different
mm/pgdat: remove pgdat lru_lock
mm/lru: debug checking for page memcg moving and lock_page_memcg

Hugh Dickins (1):
mm/lru: revise the comments of lru_lock

Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +---
Documentation/admin-guide/cgroup-v1/memory.rst | 6 +-
Documentation/trace/events-kmem.rst | 2 +-
Documentation/vm/unevictable-lru.rst | 22 ++---
include/linux/memcontrol.h | 63 ++++++++++++++
include/linux/mm_types.h | 2 +-
include/linux/mmzone.h | 5 +-
mm/compaction.c | 59 ++++++++-----
mm/filemap.c | 4 +-
mm/huge_memory.c | 18 ++--
mm/memcontrol.c | 88 +++++++++++++++-----
mm/mlock.c | 28 +++----
mm/mmzone.c | 1 +
mm/page_alloc.c | 1 -
mm/page_idle.c | 7 +-
mm/rmap.c | 2 +-
mm/swap.c | 75 +++++++----------
mm/vmscan.c | 97 ++++++++++++----------
18 files changed, 300 insertions(+), 195 deletions(-)

--
1.8.3.1


2019-12-10 11:49:55

by Alex Shi

[permalink] [raw]
Subject: [PATCH v5 5/8] mm/swap: only change the lru_lock iff page's lruvec is different

Since we introduced relock_page_lruvec, we could use it in more place
to reduce spin_locks.

Signed-off-by: Alex Shi <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/swap.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 97e108be4f92..84a845968e1d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -196,11 +196,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
for (i = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];

- lruvec = lock_page_lruvec_irqsave(page, &flags);
+ lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags);

(*move_fn)(page, lruvec, arg);
- unlock_page_lruvec_irqrestore(lruvec, flags);
}
+ if (lruvec)
+ unlock_page_lruvec_irqrestore(lruvec, flags);

release_pages(pvec->pages, pvec->nr);
pagevec_reinit(pvec);
@@ -819,14 +820,11 @@ void release_pages(struct page **pages, int nr)
}

if (PageLRU(page)) {
- struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
+ struct lruvec *pre_lruvec = lruvec;

- if (new_lruvec != lruvec) {
- if (lruvec)
- unlock_page_lruvec_irqrestore(lruvec, flags);
+ lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags);
+ if (pre_lruvec != lruvec)
lock_batch = 0;
- lruvec = lock_page_lruvec_irqsave(page, &flags);
- }

VM_BUG_ON_PAGE(!PageLRU(page), page);
__ClearPageLRU(page);
--
1.8.3.1

2019-12-10 11:50:41

by Alex Shi

[permalink] [raw]
Subject: [PATCH v5 4/8] mm/mlock: optimize munlock_pagevec by relocking

During the pagevec locking, a new page's lruvec is may same as
previous one. Thus we could save a re-locking, and only
change lock iff lruvec is newer.

Signed-off-by: Alex Shi <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
---
mm/mlock.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 10d15f58b061..050f999eadb1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -289,6 +289,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
{
int i;
int nr = pagevec_count(pvec);
+ int delta_munlocked = -nr;
struct pagevec pvec_putback;
struct lruvec *lruvec = NULL;
int pgrescued = 0;
@@ -299,20 +300,19 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
for (i = 0; i < nr; i++) {
struct page *page = pvec->pages[i];

- lruvec = lock_page_lruvec_irq(page);
+ lruvec = relock_page_lruvec_irq(page, lruvec);

if (TestClearPageMlocked(page)) {
/*
* We already have pin from follow_page_mask()
* so we can spare the get_page() here.
*/
- if (__munlock_isolate_lru_page(page, lruvec, false)) {
- __mod_zone_page_state(zone, NR_MLOCK, -1);
- unlock_page_lruvec_irq(lruvec);
+ if (__munlock_isolate_lru_page(page, lruvec, false))
continue;
- } else
+ else
__munlock_isolation_failed(page);
- }
+ } else
+ delta_munlocked++;

/*
* We won't be munlocking this page in the next phase
@@ -322,8 +322,10 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
*/
pagevec_add(&pvec_putback, pvec->pages[i]);
pvec->pages[i] = NULL;
- unlock_page_lruvec_irq(lruvec);
}
+ __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
+ if (lruvec)
+ unlock_page_lruvec_irq(lruvec);

/* Now we can release pins of pages that we are not munlocking */
pagevec_release(&pvec_putback);
--
1.8.3.1

2019-12-10 11:50:41

by Alex Shi

[permalink] [raw]
Subject: [PATCH v5 8/8] mm/lru: debug checking for page memcg moving and lock_page_memcg

The extra irq disable/enable and BUG_ON checking costs 5% readtwice
performance on a 2 socket * 26 cores * HT box.

Need to remove them later?

Signed-off-by: Alex Shi <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/compaction.c | 4 ++++
mm/memcontrol.c | 13 +++++++++++++
2 files changed, 17 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0907eec3db84..7f750a9100af 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -972,6 +972,10 @@ static bool too_many_isolated(pg_data_t *pgdat)
compact_lock_irqsave(&lruvec->lru_lock, &flags, cc);
locked_lruvec = lruvec;

+#ifdef CONFIG_MEMCG
+ if (!mem_cgroup_disabled())
+ VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page);
+#endif
/* Try get exclusive access under lock */
if (!skip_updated) {
skip_updated = true;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4a92689c37d6..833df0ce1bc1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1255,6 +1255,10 @@ struct lruvec *lock_page_lruvec_irq(struct page *page)
lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
spin_lock_irq(&lruvec->lru_lock);

+#ifdef CONFIG_MEMCG
+ if (!mem_cgroup_disabled())
+ VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page);
+#endif
return lruvec;
}

@@ -1267,6 +1271,10 @@ struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags)
lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page));
spin_lock_irqsave(&lruvec->lru_lock, *flags);

+#ifdef CONFIG_MEMCG
+ if (!mem_cgroup_disabled())
+ VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page);
+#endif
return lruvec;
}

@@ -2015,6 +2023,11 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
if (unlikely(!memcg))
return NULL;

+ /* temporary lockdep checking, need remove */
+ local_irq_save(flags);
+ might_lock(&memcg->move_lock);
+ local_irq_restore(flags);
+
if (atomic_read(&memcg->moving_account) <= 0)
return memcg;

--
1.8.3.1