2017-03-01 07:55:39

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim

246e87a93934 ("memcg: fix get_scan_count() for small targets") sought
to avoid high reclaim priorities for memcg by forcing it to scan a
minimum amount of pages when lru_pages >> priority yielded nothing.
This was done at a time when reclaim decisions like dirty throttling
were tied to the priority level.

Nowadays, the only meaningful thing still tied to priority dropping
below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
allowed to write. But that is from an era where direct reclaim was
still allowed to call ->writepage, and kswapd nowadays avoids writes
until it's scanned every clean page in the system. Potential changes
to how quick sc->may_writepage could trigger are of little concern.

Remove the force_scan stuff, as well as the ugly multi-pass target
calculation that it necessitated.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/vmscan.c | 94 ++++++++++++++++++++++++-------------------------------------
1 file changed, 37 insertions(+), 57 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 46b6223fe7f3..8cff6e2cd02c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2122,21 +2122,8 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
unsigned long anon_prio, file_prio;
enum scan_balance scan_balance;
unsigned long anon, file;
- bool force_scan = false;
unsigned long ap, fp;
enum lru_list lru;
- bool some_scanned;
- int pass;
-
- /*
- * If the zone or memcg is small, nr[l] can be 0. When
- * reclaiming for a memcg, a priority drop can cause high
- * latencies, so it's better to scan a minimum amount. When a
- * cgroup has already been deleted, scrape out the remaining
- * cache forcefully to get rid of the lingering state.
- */
- if (!global_reclaim(sc) || !mem_cgroup_online(memcg))
- force_scan = true;

/* If we have no swap space, do not bother scanning anon pages. */
if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {
@@ -2267,55 +2254,48 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
fraction[1] = fp;
denominator = ap + fp + 1;
out:
- some_scanned = false;
- /* Only use force_scan on second pass. */
- for (pass = 0; !some_scanned && pass < 2; pass++) {
- *lru_pages = 0;
- for_each_evictable_lru(lru) {
- int file = is_file_lru(lru);
- unsigned long size;
- unsigned long scan;
-
- size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
- scan = size >> sc->priority;
-
- if (!scan && pass && force_scan)
- scan = min(size, SWAP_CLUSTER_MAX);
-
- switch (scan_balance) {
- case SCAN_EQUAL:
- /* Scan lists relative to size */
- break;
- case SCAN_FRACT:
- /*
- * Scan types proportional to swappiness and
- * their relative recent reclaim efficiency.
- */
- scan = div64_u64(scan * fraction[file],
- denominator);
- break;
- case SCAN_FILE:
- case SCAN_ANON:
- /* Scan one type exclusively */
- if ((scan_balance == SCAN_FILE) != file) {
- size = 0;
- scan = 0;
- }
- break;
- default:
- /* Look ma, no brain */
- BUG();
- }
+ *lru_pages = 0;
+ for_each_evictable_lru(lru) {
+ int file = is_file_lru(lru);
+ unsigned long size;
+ unsigned long scan;

- *lru_pages += size;
- nr[lru] = scan;
+ size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
+ scan = size >> sc->priority;
+ /*
+ * If the cgroup's already been deleted, make sure to
+ * scrape out the remaining cache.
+ */
+ if (!scan && !mem_cgroup_online(memcg))
+ scan = min(size, SWAP_CLUSTER_MAX);

+ switch (scan_balance) {
+ case SCAN_EQUAL:
+ /* Scan lists relative to size */
+ break;
+ case SCAN_FRACT:
/*
- * Skip the second pass and don't force_scan,
- * if we found something to scan.
+ * Scan types proportional to swappiness and
+ * their relative recent reclaim efficiency.
*/
- some_scanned |= !!scan;
+ scan = div64_u64(scan * fraction[file],
+ denominator);
+ break;
+ case SCAN_FILE:
+ case SCAN_ANON:
+ /* Scan one type exclusively */
+ if ((scan_balance == SCAN_FILE) != file) {
+ size = 0;
+ scan = 0;
+ }
+ break;
+ default:
+ /* Look ma, no brain */
+ BUG();
}
+
+ *lru_pages += size;
+ nr[lru] = scan;
}
}

--
2.11.1


2017-03-01 15:53:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim

On Tue 28-02-17 16:40:04, Johannes Weiner wrote:
> 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought
> to avoid high reclaim priorities for memcg by forcing it to scan a
> minimum amount of pages when lru_pages >> priority yielded nothing.
> This was done at a time when reclaim decisions like dirty throttling
> were tied to the priority level.
>
> Nowadays, the only meaningful thing still tied to priority dropping
> below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
> allowed to write. But that is from an era where direct reclaim was
> still allowed to call ->writepage, and kswapd nowadays avoids writes
> until it's scanned every clean page in the system. Potential changes
> to how quick sc->may_writepage could trigger are of little concern.
>
> Remove the force_scan stuff, as well as the ugly multi-pass target
> calculation that it necessitated.

I _really_ like this, I hated the multi-pass part. One thig that I am
worried about and changelog doesn't mention it is what we are going to
do about small (<16MB) memcgs. On one hand they were already ignored in
the global reclaim so this is nothing really new but maybe we want to
preserve the behavior for the memcg reclaim at least which would reduce
side effect of this patch which is a great cleanup otherwise. Or at
least be explicit about this in the changelog.

Btw. why cannot we simply force scan at least SWAP_CLUSTER_MAX
unconditionally?

> + /*
> + * If the cgroup's already been deleted, make sure to
> + * scrape out the remaining cache.
Also make sure that small memcgs will not get
unnoticed during the memcg reclaim

> + */
> + if (!scan && !mem_cgroup_online(memcg))

if (!scan && (!mem_cgroup_online(memcg) || !global_reclaim(sc)))

> + scan = min(size, SWAP_CLUSTER_MAX);
>

--
Michal Hocko
SUSE Labs

2017-03-01 18:24:50

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim

On Wed, Mar 01, 2017 at 04:40:27PM +0100, Michal Hocko wrote:
> On Tue 28-02-17 16:40:04, Johannes Weiner wrote:
> > 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought
> > to avoid high reclaim priorities for memcg by forcing it to scan a
> > minimum amount of pages when lru_pages >> priority yielded nothing.
> > This was done at a time when reclaim decisions like dirty throttling
> > were tied to the priority level.
> >
> > Nowadays, the only meaningful thing still tied to priority dropping
> > below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
> > allowed to write. But that is from an era where direct reclaim was
> > still allowed to call ->writepage, and kswapd nowadays avoids writes
> > until it's scanned every clean page in the system. Potential changes
> > to how quick sc->may_writepage could trigger are of little concern.
> >
> > Remove the force_scan stuff, as well as the ugly multi-pass target
> > calculation that it necessitated.
>
> I _really_ like this, I hated the multi-pass part. One thig that I am
> worried about and changelog doesn't mention it is what we are going to
> do about small (<16MB) memcgs. On one hand they were already ignored in
> the global reclaim so this is nothing really new but maybe we want to
> preserve the behavior for the memcg reclaim at least which would reduce
> side effect of this patch which is a great cleanup otherwise. Or at
> least be explicit about this in the changelog.

<16MB groups are a legitimate concern during global reclaim, but we
have done it this way for a long time and it never seemed to have
mattered in practice.

And for limit reclaim, this should be much less of a concern. It just
means we no longer scan these groups at DEF_PRIORITY and will have to
increase the scan window. I don't see a problem with that. And that
consequence of higher priorities is right in the patch subject.

> Btw. why cannot we simply force scan at least SWAP_CLUSTER_MAX
> unconditionally?
>
> > + /*
> > + * If the cgroup's already been deleted, make sure to
> > + * scrape out the remaining cache.
> Also make sure that small memcgs will not get
> unnoticed during the memcg reclaim
>
> > + */
> > + if (!scan && !mem_cgroup_online(memcg))
>
> if (!scan && (!mem_cgroup_online(memcg) || !global_reclaim(sc)))

With this I'd be worried about regressing the setups pointed out in
6f04f48dc9c0 ("mm: only force scan in reclaim when none of the LRUs
are big enough.").

Granted, that patch is a little dubious. IMO, we should be steering
the LRU balance through references and, in that case in particular,
with swappiness. Using the default 60 for zswap is too low.

Plus, I would expect the refault detection code that was introduced
around the same time as this patch to counter-act the hot file
thrashing that is mentioned in that patch's changelog.

Nevertheless, it seems a bit gratuitous to go against that change so
directly when global reclaim hasn't historically been a problem with
groups <16MB. Limit reclaim should be fine too.

2017-03-01 19:15:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim

On Wed 01-03-17 12:36:28, Johannes Weiner wrote:
> On Wed, Mar 01, 2017 at 04:40:27PM +0100, Michal Hocko wrote:
> > On Tue 28-02-17 16:40:04, Johannes Weiner wrote:
> > > 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought
> > > to avoid high reclaim priorities for memcg by forcing it to scan a
> > > minimum amount of pages when lru_pages >> priority yielded nothing.
> > > This was done at a time when reclaim decisions like dirty throttling
> > > were tied to the priority level.
> > >
> > > Nowadays, the only meaningful thing still tied to priority dropping
> > > below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
> > > allowed to write. But that is from an era where direct reclaim was
> > > still allowed to call ->writepage, and kswapd nowadays avoids writes
> > > until it's scanned every clean page in the system. Potential changes
> > > to how quick sc->may_writepage could trigger are of little concern.
> > >
> > > Remove the force_scan stuff, as well as the ugly multi-pass target
> > > calculation that it necessitated.
> >
> > I _really_ like this, I hated the multi-pass part. One thig that I am
> > worried about and changelog doesn't mention it is what we are going to
> > do about small (<16MB) memcgs. On one hand they were already ignored in
> > the global reclaim so this is nothing really new but maybe we want to
> > preserve the behavior for the memcg reclaim at least which would reduce
> > side effect of this patch which is a great cleanup otherwise. Or at
> > least be explicit about this in the changelog.
>
> <16MB groups are a legitimate concern during global reclaim, but we
> have done it this way for a long time and it never seemed to have
> mattered in practice.

Yeah, this is not really easy to spot because there are usually other
memcgs which can be reclaimed.

> And for limit reclaim, this should be much less of a concern. It just
> means we no longer scan these groups at DEF_PRIORITY and will have to
> increase the scan window. I don't see a problem with that. And that
> consequence of higher priorities is right in the patch subject.

well the memory pressure spills over to others in the same hierarchy.
But I agree this shouldn't a disaster.

> > Btw. why cannot we simply force scan at least SWAP_CLUSTER_MAX
> > unconditionally?
> >
> > > + /*
> > > + * If the cgroup's already been deleted, make sure to
> > > + * scrape out the remaining cache.
> > Also make sure that small memcgs will not get
> > unnoticed during the memcg reclaim
> >
> > > + */
> > > + if (!scan && !mem_cgroup_online(memcg))
> >
> > if (!scan && (!mem_cgroup_online(memcg) || !global_reclaim(sc)))
>
> With this I'd be worried about regressing the setups pointed out in
> 6f04f48dc9c0 ("mm: only force scan in reclaim when none of the LRUs
> are big enough.").
>
> Granted, that patch is a little dubious. IMO, we should be steering
> the LRU balance through references and, in that case in particular,
> with swappiness. Using the default 60 for zswap is too low.
>
> Plus, I would expect the refault detection code that was introduced
> around the same time as this patch to counter-act the hot file
> thrashing that is mentioned in that patch's changelog.
>
> Nevertheless, it seems a bit gratuitous to go against that change so
> directly when global reclaim hasn't historically been a problem with
> groups <16MB. Limit reclaim should be fine too.

As I've already mentioned, I really love this patch I just think this is
a subtle side effect. The above reasoning should be good enough I
believe.

Anyway I forgot to add, I will leave the decision whether to have this
in a separate patch or just added to the changelog to you.
Acked-by: Michal Hocko <[email protected]>
--
Michal Hocko
SUSE Labs

2017-03-02 03:33:45

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim

On March 01, 2017 5:40 AM Johannes Weiner wrote:
>
> 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought
> to avoid high reclaim priorities for memcg by forcing it to scan a
> minimum amount of pages when lru_pages >> priority yielded nothing.
> This was done at a time when reclaim decisions like dirty throttling
> were tied to the priority level.
>
> Nowadays, the only meaningful thing still tied to priority dropping
> below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally
> allowed to write. But that is from an era where direct reclaim was
> still allowed to call ->writepage, and kswapd nowadays avoids writes
> until it's scanned every clean page in the system. Potential changes
> to how quick sc->may_writepage could trigger are of little concern.
>
> Remove the force_scan stuff, as well as the ugly multi-pass target
> calculation that it necessitated.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>