On Sat, Oct 26, 2019 at 07:07:45PM +0800, Hillf Danton wrote:
>
> Currently soft limit reclaim is frozen, see
> Documentation/admin-guide/cgroup-v2.rst for reasons.
>
> This work adds memcg hook into kswapd's logic to bypass slr,
> paving a brick for its cleanup later.
>
> After b23afb93d317 ("memcg: punt high overage reclaim to
> return-to-userland path"), high limit breachers are reclaimed one
> after another spiraling up through the memcg hierarchy before
> returning to userspace.
>
> We can not add new hook yet if it is infeasible to defer that
> reclaiming a bit further until kswapd becomes active.
>
> It can be defered however because high limit breach looks benign
> in the absence of memory pressure, or we ensure it will be
> reclaimed soon in the presence of kswapd.
I have no idea what this patch is actually trying to do. But this
premise here, as well as the implementation, are seriously flawed.
memory.high needs to be enforced synchronously. Current users expect
workloads to be strictly contained or throttled by memory.high in
order to ensure consistent behavior regardless of the host
environment, as well as prevent interference with other workloads
whose startup time could be slowed down by this lack of containment.
On the implementation side, it appears you patched out reclaim but
left in the throttling that's supposed to make up for failing
reclaim. That means that once a cgroup tree's cache footprint grows
past its memory.high, instead of simply picking up the cold cache
pages, it'll get throttled heavily and see extreme memory pressure. It
could take ages for it to grow to the point where kswapd wakes up.
Nacked-by: Johannes Weiner <[email protected]>