I've noticed, that dying memory cgroups are often pinned
in memory by a single pagecache page. Even under moderate
memory pressure they sometimes stayed in such state
for a long time. That looked strange.
My investigation showed that the problem is caused by
applying the LRU pressure balancing math:
scan = div64_u64(scan * fraction[lru], denominator),
where
denominator = fraction[anon] + fraction[file] + 1.
Because fraction[lru] is always less than denominator,
if the initial scan size is 1, the result is always 0.
This means the last page is not scanned and has
no chances to be reclaimed.
Fix this by skipping the balancing logic if the initial
scan count is 1.
In practice this change significantly improves the speed
of dying cgroups reclaim.
Signed-off-by: Roman Gushchin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
---
mm/vmscan.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..f85c5ec01886 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2287,9 +2287,12 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
/*
* Scan types proportional to swappiness and
* their relative recent reclaim efficiency.
+ * Make sure we don't miss the last page
+ * because of a round-off error.
*/
- scan = div64_u64(scan * fraction[file],
- denominator);
+ if (scan > 1)
+ scan = div64_u64(scan * fraction[file],
+ denominator);
break;
case SCAN_FILE:
case SCAN_ANON:
--
2.17.1
On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> - scan = div64_u64(scan * fraction[file],
> - denominator);
> + if (scan > 1)
> + scan = div64_u64(scan * fraction[file],
> + denominator);
Wouldn't we be better off doing a div_round_up? ie:
scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
although i'd rather hide that in a new macro in math64.h than opencode it
here.
On Fri, Aug 17, 2018 at 06:22:13PM -0700, Matthew Wilcox wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> > - scan = div64_u64(scan * fraction[file],
> > - denominator);
> > + if (scan > 1)
> > + scan = div64_u64(scan * fraction[file],
> > + denominator);
>
> Wouldn't we be better off doing a div_round_up? ie:
>
> scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.
Good idea! Will do in v2.
Thanks!
On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <[email protected]> wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> - scan = div64_u64(scan * fraction[file],
>> - denominator);
>> + if (scan > 1)
>> + scan = div64_u64(scan * fraction[file],
>> + denominator);
>
> Wouldn't we be better off doing a div_round_up? ie:
>
> scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.
All numbers here should be up to nr_pages * 200 and fit into unsigned long.
I see no reason for u64. If they overflow then u64 wouldn't help either.
There is macro DIV_ROUND_UP in kernel.h
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <[email protected]> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> - scan = div64_u64(scan * fraction[file],
> >> - denominator);
> >> + if (scan > 1)
> >> + scan = div64_u64(scan * fraction[file],
> >> + denominator);
> >
> > Wouldn't we be better off doing a div_round_up? ie:
> >
> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
>
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.
Shaohua added the div64 usage initially, adding him to the cc.
> There is macro DIV_ROUND_UP in kernel.h
Indeed there is.
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <[email protected]> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> - scan = div64_u64(scan * fraction[file],
> >> - denominator);
> >> + if (scan > 1)
> >> + scan = div64_u64(scan * fraction[file],
> >> + denominator);
> >
> > Wouldn't we be better off doing a div_round_up? ie:
> >
> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
>
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.
It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
to four times of what's on the LRUs. That can overflow a u32 with even
small amounts of memory.
On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <[email protected]> wrote:
> On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
>> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <[email protected]> wrote:
>> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> >> - scan = div64_u64(scan * fraction[file],
>> >> - denominator);
>> >> + if (scan > 1)
>> >> + scan = div64_u64(scan * fraction[file],
>> >> + denominator);
>> >
>> > Wouldn't we be better off doing a div_round_up? ie:
>> >
>> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>> >
>> > although i'd rather hide that in a new macro in math64.h than opencode it
>> > here.
>>
>> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
>> I see no reason for u64. If they overflow then u64 wouldn't help either.
>
> It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> to four times of what's on the LRUs. That can overflow a u32 with even
> small amounts of memory.
Ah, this thing is inverted because it aims to proportional reactivation rate
rather than the proportional pressure to reclaimable pages.
That's not obvious. I suppose this should be in comment above it.
Well, at least denominator should fit into unsigned long. So full
64/64 division is redundant.
On Wed, Aug 22, 2018 at 09:01:19AM +0300, Konstantin Khlebnikov wrote:
> On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <[email protected]> wrote:
> > On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> >> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <[email protected]> wrote:
> >> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> >> - scan = div64_u64(scan * fraction[file],
> >> >> - denominator);
> >> >> + if (scan > 1)
> >> >> + scan = div64_u64(scan * fraction[file],
> >> >> + denominator);
> >> >
> >> > Wouldn't we be better off doing a div_round_up? ie:
> >> >
> >> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >> >
> >> > although i'd rather hide that in a new macro in math64.h than opencode it
> >> > here.
> >>
> >> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> >> I see no reason for u64. If they overflow then u64 wouldn't help either.
> >
> > It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> > to four times of what's on the LRUs. That can overflow a u32 with even
> > small amounts of memory.
>
> Ah, this thing is inverted because it aims to proportional reactivation rate
> rather than the proportional pressure to reclaimable pages.
> That's not obvious. I suppose this should be in comment above it.
>
> Well, at least denominator should fit into unsigned long. So full
> 64/64 division is redundant.
In any case it's not related to the original issue and should be
treated separately. I'd like to keep the patch simple to make
backporting to stable easy.
All refactorings can be done separately, if necessarily.
Thanks!