2019-01-04 00:24:54

by Hussam Al-Tayeb

[permalink] [raw]
Subject: Excessive swapping under Linux 4.14.91 (no issues in 4.14.90).

Hello. My system has 16GB of ram. Before kernel 4.14.91, it would use about 700KBytes to 1MB of the swap if the memory usage was 4 to 5GB in use out of 16GB (15.6 actually).

After upgrading to 4.14.91, I am seeing 500 to 700MB of swap usage even under low memory pressure situations. For example, 3.5GB used out of 16GB.

I downgraded to 4.14.90 and the excessive swapping stopped. I upgraded to 4.14.91 and the heavy swapping came back.

Any idea how I can gather information to tell what is happening?

The machine always has the same services and applications running so I am sure this is a kernel issue. It hasn't received any non-kernel updates in months. This narrows the issue down to the kernel.

Thank you.


2019-01-04 03:15:00

by Vito Caputo

[permalink] [raw]
Subject: Re: Excessive swapping under Linux 4.14.91 (no issues in 4.14.90).

On Thu, Jan 03, 2019 at 09:33:09PM +0100, Hussam Al-Tayeb wrote:
> > Sent: Thursday, January 03, 2019 at 10:12 PM
> > From: "Vito Caputo" <[email protected]>
> > To: "Hussam Al-Tayeb" <[email protected]>
> > Subject: Re: Excessive swapping under Linux 4.14.91 (no issues in 4.14.90).
> >
> >
> > The diff between 4.14.90 and 4.14.91 is rather small, appended below is
> > the entire shortlog.
> >
> > There's only mm one commit:
> >
> > > commit 36f93a2e7dce0a4f58b96a7ecb3af4e5897a60d4
> > > Author: Roman Gushchin <[email protected]>
> > > Date: Fri Oct 26 15:03:27 2018 -0700
> > >
> > > mm: don't miss the last page because of round-off error
> > >
> > > commit 68600f623d69da428c6163275f97ca126e1a8ec5 upstream.
> > >
> > > I've noticed, that dying memory cgroups are often pinned in memory by a
> > > single pagecache page. Even under moderate memory pressure they sometimes
> > > stayed in such state for a long time. That looked strange.
> > >
> > > My investigation showed that the problem is caused by applying the LRU
> > > pressure balancing math:
> > >
> > > scan = div64_u64(scan * fraction[lru], denominator),
> > >
> > > where
> > >
> > > denominator = fraction[anon] + fraction[file] + 1.
> > >
> > > Because fraction[lru] is always less than denominator, if the initial scan
> > > size is 1, the result is always 0.
> > >
> > > This means the last page is not scanned and has
> > > no chances to be reclaimed.
> > >
> > > Fix this by rounding up the result of the division.
> > >
> > > In practice this change significantly improves the speed of dying cgroups
> > > reclaim.
> > >
> > > [[email protected]: prevent double calculation of DIV64_U64_ROUND_UP() arguments]
> > > Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle
> > > Link: http://lkml.kernel.org/r/[email protected]
> > > Signed-off-by: Roman Gushchin <[email protected]>
> > > Reviewed-by: Andrew Morton <[email protected]>
> > > Cc: Johannes Weiner <[email protected]>
> > > Cc: Michal Hocko <[email protected]>
> > > Cc: Tejun Heo <[email protected]>
> > > Cc: Rik van Riel <[email protected]>
> > > Cc: Konstantin Khlebnikov <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Signed-off-by: Andrew Morton <[email protected]>
> > > Signed-off-by: Linus Torvalds <[email protected]>
> > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > If you're up for compiling a kernel, you could try reverting just
> > 36f93a2e from 4.14.91 and seeing if your problem goes away.
> >
> > Regards,
> > Vito Caputo
>
> I will do that. Thank you.

I just realized I didn't include lkml in replying to you, so I'm adding
them now for posterity. Please include the list in any further
discussion.

Regards,
Vito Caputo