By reviewing code, I find that when enter do_try_to_free_pages, the
may_thrash is always clear, and it will retry shrink zones to tap
cgroup's reserves memory by setting may_thrash when the former
shrink_zones reclaim nothing.
However, when memcg is disabled or on legacy hierarchy, or there do not
have any memcg protected by low limit, it should not do this useless retry
at all, for we do not have any cgroup's reserves memory to tap, and we
have already done hard work but made no progress.
To avoid this unneeded retrying, add a new field in scan_control named
memcg_low_protection, set it if there is any memcg protected by low limit
and only do the retry when memcg_low_protection is set while may_thrash
is clear.
Signed-off-by: Yisheng Xie <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Suggested-by: Shakeel Butt <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
---
v4:
- add a new field in scan_control named memcg_low_protection to check whether
there have any memcg protected by low limit. - Michal
v3:
- rename function may_thrash() to mem_cgroup_thrashed() to avoid confusing.
v2:
- more restrictive condition for retry of shrink_zones (restricting
cgroup_disabled=memory boot option and cgroup legacy hierarchy) - Shakeel
- add a stub function may_thrash() to avoid compile error or warning.
- rename subject from "donot retry shrink zones when memcg is disable"
to "more restrictive condition for retry in do_try_to_free_pages"
Any comment is more than welcome!
Thanks
Yisheng Xie
mm/vmscan.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bc8031e..c4fa3d3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -100,6 +100,9 @@ struct scan_control {
/* Can cgroups be reclaimed below their normal consumption range? */
unsigned int may_thrash:1;
+ /* Did we have any memcg protected by the low limit */
+ unsigned int memcg_low_protection:1;
+
unsigned int hibernation_mode:1;
/* One of the zones is ready for compaction */
@@ -2557,6 +2560,8 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
unsigned long scanned;
if (mem_cgroup_low(root, memcg)) {
+ sc->memcg_low_protection = 1;
+
if (!sc->may_thrash)
continue;
mem_cgroup_events(memcg, MEMCG_LOW, 1);
@@ -2808,7 +2813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
return 1;
/* Untapped cgroup reserves? Don't OOM, retry. */
- if (!sc->may_thrash) {
+ if (sc->memcg_low_protection && !sc->may_thrash) {
sc->priority = initial_priority;
sc->may_thrash = 1;
goto retry;
--
1.7.12.4
On Wed 15-03-17 19:36:48, Yisheng Xie wrote:
> By reviewing code, I find that when enter do_try_to_free_pages, the
> may_thrash is always clear, and it will retry shrink zones to tap
> cgroup's reserves memory by setting may_thrash when the former
> shrink_zones reclaim nothing.
>
> However, when memcg is disabled or on legacy hierarchy, or there do not
> have any memcg protected by low limit, it should not do this useless retry
> at all, for we do not have any cgroup's reserves memory to tap, and we
> have already done hard work but made no progress.
>
> To avoid this unneeded retrying, add a new field in scan_control named
> memcg_low_protection, set it if there is any memcg protected by low limit
> and only do the retry when memcg_low_protection is set while may_thrash
> is clear.
You still haven't explained why a retry is bad thing. It certainly is
not about performance because not a single page being reclaimed means
that all the performance went to hell already. Please always make it
clear why the change is needed/desirable.
But I agree that this makes the code easier to understand so I am OK
with this change.
> Signed-off-by: Yisheng Xie <[email protected]>
> Suggested-by: Michal Hocko <[email protected]>
> Suggested-by: Shakeel Butt <[email protected]>
> Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> v4:
> - add a new field in scan_control named memcg_low_protection to check whether
> there have any memcg protected by low limit. - Michal
>
> v3:
> - rename function may_thrash() to mem_cgroup_thrashed() to avoid confusing.
>
> v2:
> - more restrictive condition for retry of shrink_zones (restricting
> cgroup_disabled=memory boot option and cgroup legacy hierarchy) - Shakeel
>
> - add a stub function may_thrash() to avoid compile error or warning.
>
> - rename subject from "donot retry shrink zones when memcg is disable"
> to "more restrictive condition for retry in do_try_to_free_pages"
>
> Any comment is more than welcome!
>
> Thanks
> Yisheng Xie
>
> mm/vmscan.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bc8031e..c4fa3d3 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -100,6 +100,9 @@ struct scan_control {
> /* Can cgroups be reclaimed below their normal consumption range? */
> unsigned int may_thrash:1;
>
> + /* Did we have any memcg protected by the low limit */
> + unsigned int memcg_low_protection:1;
> +
> unsigned int hibernation_mode:1;
>
> /* One of the zones is ready for compaction */
> @@ -2557,6 +2560,8 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> unsigned long scanned;
>
> if (mem_cgroup_low(root, memcg)) {
> + sc->memcg_low_protection = 1;
> +
> if (!sc->may_thrash)
> continue;
> mem_cgroup_events(memcg, MEMCG_LOW, 1);
> @@ -2808,7 +2813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> return 1;
>
> /* Untapped cgroup reserves? Don't OOM, retry. */
> - if (!sc->may_thrash) {
> + if (sc->memcg_low_protection && !sc->may_thrash) {
> sc->priority = initial_priority;
> sc->may_thrash = 1;
> goto retry;
> --
> 1.7.12.4
>
--
Michal Hocko
SUSE Labs
Hi Michal
Thanks for reviewing.
On 2017/3/15 20:41, Michal Hocko wrote:
> On Wed 15-03-17 19:36:48, Yisheng Xie wrote:
>> By reviewing code, I find that when enter do_try_to_free_pages, the
>> may_thrash is always clear, and it will retry shrink zones to tap
>> cgroup's reserves memory by setting may_thrash when the former
>> shrink_zones reclaim nothing.
>>
>> However, when memcg is disabled or on legacy hierarchy, or there do not
>> have any memcg protected by low limit, it should not do this useless retry
>> at all, for we do not have any cgroup's reserves memory to tap, and we
>> have already done hard work but made no progress.
>>
>> To avoid this unneeded retrying, add a new field in scan_control named
>> memcg_low_protection, set it if there is any memcg protected by low limit
>> and only do the retry when memcg_low_protection is set while may_thrash
>> is clear.
>
> You still haven't explained why a retry is bad thing. It certainly is
> not about performance because not a single page being reclaimed means
> that all the performance went to hell already. Please always make it
> clear why the change is needed/desirable.
So sorry for about that! This patch is just based on code reviewing, and
sure is nothing to do with performance, therefore, I also cannot get any
data about it. IMO, it may save some cycles for reclaim and this make me
try to prepare this patch. Just as what you said that "the current additional
round of reclaim is just lame for we are trying hard to control the retry
logic from the page allocator".
Thanks
Yisheng Xie.
>
> But I agree that this makes the code easier to understand so I am OK
> with this change.
>
On Wed, Mar 15, 2017 at 07:36:48PM +0800, Yisheng Xie wrote:
> By reviewing code, I find that when enter do_try_to_free_pages, the
> may_thrash is always clear, and it will retry shrink zones to tap
> cgroup's reserves memory by setting may_thrash when the former
> shrink_zones reclaim nothing.
>
> However, when memcg is disabled or on legacy hierarchy, or there do not
> have any memcg protected by low limit, it should not do this useless retry
> at all, for we do not have any cgroup's reserves memory to tap, and we
> have already done hard work but made no progress.
>
> To avoid this unneeded retrying, add a new field in scan_control named
> memcg_low_protection, set it if there is any memcg protected by low limit
> and only do the retry when memcg_low_protection is set while may_thrash
> is clear.
>
> Signed-off-by: Yisheng Xie <[email protected]>
> Suggested-by: Michal Hocko <[email protected]>
> Suggested-by: Shakeel Butt <[email protected]>
> Reviewed-by: Shakeel Butt <[email protected]>
I don't see the point of this patch. It adds more code just to
marginally optimize a near-OOM cold path.
On Fri 17-03-17 10:50:20, Johannes Weiner wrote:
> On Wed, Mar 15, 2017 at 07:36:48PM +0800, Yisheng Xie wrote:
> > By reviewing code, I find that when enter do_try_to_free_pages, the
> > may_thrash is always clear, and it will retry shrink zones to tap
> > cgroup's reserves memory by setting may_thrash when the former
> > shrink_zones reclaim nothing.
> >
> > However, when memcg is disabled or on legacy hierarchy, or there do not
> > have any memcg protected by low limit, it should not do this useless retry
> > at all, for we do not have any cgroup's reserves memory to tap, and we
> > have already done hard work but made no progress.
> >
> > To avoid this unneeded retrying, add a new field in scan_control named
> > memcg_low_protection, set it if there is any memcg protected by low limit
> > and only do the retry when memcg_low_protection is set while may_thrash
> > is clear.
> >
> > Signed-off-by: Yisheng Xie <[email protected]>
> > Suggested-by: Michal Hocko <[email protected]>
> > Suggested-by: Shakeel Butt <[email protected]>
> > Reviewed-by: Shakeel Butt <[email protected]>
>
> I don't see the point of this patch. It adds more code just to
> marginally optimize a near-OOM cold path.
The current behavior is surprising and not really desirable when we want
to control the retry logic from the page allocator. So I do not think
that the additional 5 lines of code would be unbearable burden or
maintenance cost. I am not saying the patch adds any break through but
it is not pointless either.
--
Michal Hocko
SUSE Labs
On Wed, Mar 15, 2017 at 07:36:48PM +0800, Yisheng Xie wrote:
> @@ -100,6 +100,9 @@ struct scan_control {
> /* Can cgroups be reclaimed below their normal consumption range? */
> unsigned int may_thrash:1;
>
> + /* Did we have any memcg protected by the low limit */
> + unsigned int memcg_low_protection:1;
These are both bad names. How about the following pair?
/*
* Cgroups are not reclaimed below their configured memory.low,
* unless we threaten to OOM. If any cgroups are skipped due to
* memory.low and nothing was reclaimed, go back for memory.low.
*/
unsigned int memcg_low_skipped:1
unsigned int memcg_low_reclaim:1;
> @@ -2557,6 +2560,8 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> unsigned long scanned;
>
> if (mem_cgroup_low(root, memcg)) {
> + sc->memcg_low_protection = 1;
> +
> if (!sc->may_thrash)
> continue;
if (!sc->memcg_low_reclaim) {
sc->memcg_low_skipped = 1;
continue;
}
> mem_cgroup_events(memcg, MEMCG_LOW, 1);
> @@ -2808,7 +2813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> return 1;
>
> /* Untapped cgroup reserves? Don't OOM, retry. */
> - if (!sc->may_thrash) {
> + if (sc->memcg_low_protection && !sc->may_thrash) {
if (sc->memcg_low_skipped) {
[...]
sc->memcg_low_reclaim = 1;
goto retry;
}
On Fri, Mar 17, 2017 at 07:45:27PM +0100, Michal Hocko wrote:
> On Fri 17-03-17 14:39:28, Johannes Weiner wrote:
> > On Wed, Mar 15, 2017 at 07:36:48PM +0800, Yisheng Xie wrote:
> > > @@ -2808,7 +2813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> > > return 1;
> > >
> > > /* Untapped cgroup reserves? Don't OOM, retry. */
> > > - if (!sc->may_thrash) {
> > > + if (sc->memcg_low_protection && !sc->may_thrash) {
> >
> > if (sc->memcg_low_skipped) {
> > [...]
> > sc->memcg_low_reclaim = 1;
>
> you need to set memcg_low_skipped = 0 here, right? Otherwise we do not
> have break out of the loop. Or am I missing something?
Oops, you're right of course. That needs to be reset.
On Fri 17-03-17 14:39:28, Johannes Weiner wrote:
> On Wed, Mar 15, 2017 at 07:36:48PM +0800, Yisheng Xie wrote:
> > @@ -100,6 +100,9 @@ struct scan_control {
> > /* Can cgroups be reclaimed below their normal consumption range? */
> > unsigned int may_thrash:1;
> >
> > + /* Did we have any memcg protected by the low limit */
> > + unsigned int memcg_low_protection:1;
>
> These are both bad names. How about the following pair?
>
> /*
> * Cgroups are not reclaimed below their configured memory.low,
> * unless we threaten to OOM. If any cgroups are skipped due to
> * memory.low and nothing was reclaimed, go back for memory.low.
> */
> unsigned int memcg_low_skipped:1
> unsigned int memcg_low_reclaim:1;
yes this is much better
>
> > @@ -2557,6 +2560,8 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> > unsigned long scanned;
> >
> > if (mem_cgroup_low(root, memcg)) {
> > + sc->memcg_low_protection = 1;
> > +
> > if (!sc->may_thrash)
> > continue;
>
> if (!sc->memcg_low_reclaim) {
> sc->memcg_low_skipped = 1;
> continue;
> }
>
> > mem_cgroup_events(memcg, MEMCG_LOW, 1);
> > @@ -2808,7 +2813,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> > return 1;
> >
> > /* Untapped cgroup reserves? Don't OOM, retry. */
> > - if (!sc->may_thrash) {
> > + if (sc->memcg_low_protection && !sc->may_thrash) {
>
> if (sc->memcg_low_skipped) {
> [...]
> sc->memcg_low_reclaim = 1;
you need to set memcg_low_skipped = 0 here, right? Otherwise we do not
have break out of the loop. Or am I missing something?
> goto retry;
> }
--
Michal Hocko
SUSE Labs