2014-01-30 16:58:48

by Igor Podlesny

[permalink] [raw]
Subject: That greedy Linux VM cache

Hello!

Probably every Linux newcomer's going to have concerns regarding
low free memory and hear an explanation from Linux old fellows that's
actually there's plenty of -- it's just cached, but when it's needed
for applications it's gonna be used -- on demand. I also thought so
until recently I noticed that even when free memory's is almost
exhausted (~ 75 Mib), and processes are in sleep_on_page_killable, the
cache is somewhat like ~ 500 MiB and it's not going to return back
what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
well. That drama has been happening on rather
outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
till 3.12.9 (3.13 is the first release for a long time which simply
freezes the notebook so cold, that SysRq_B's not working, but that's
another story). Everything RAM demanding just yet crawls, load average
is getting higher and there's no paging out, but on going disk mostly
_read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
out, but not much, right now I ran Chromium (in addition to long-run
Firefox) and only 32 MiB went to swap, load avg. ~ 7

Again: 25 % is told (by top, free and finally /proc/meminfo) to be
cached, but kinda greedy.

I came across similar issue report:
http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
questions remain:

* How to analyze it? slabtop doesn't mention even 100 MiB of slab
* Why that's possible?
* The system is on Btrfs but /home is on XFS, so disk I/O might be
related to text segment paging? But anyway this leads us to question,
hey, there's 500 MiB free^Wcached.

While I'm thinking about moving system back to XFS...

P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
letter, and cached mem. now is 439 MiB. :) I know it's time to
upgrade, but hey, cached memory is free memory, right?

--
End of message. Next message?


2014-01-30 17:06:58

by David Lang

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

On Fri, 31 Jan 2014, Igor Podlesny wrote:

> Hello!
>
> Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so
> until recently I noticed that even when free memory's is almost
> exhausted (~ 75 Mib)

that's actually quite a bit of free memory, it's very common for servers to be
far lower than that.

> , and processes are in sleep_on_page_killable, the
> cache is somewhat like ~ 500 MiB and it's not going to return back
> what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
> well.

this is telling you that this data isn't clean cache that can just be dropped,
it's dirty cache that is waiting to be written, or is otherwise locked.

rather than looking at the memory numbers, instead look at the swap numbers. If
you are doing any noticable amount of swapping (si so in vmstat), then you are
out of memory and the cache that can be dropped has been dropped.

this does mean that you ahve a hard time telling when you are getting close to
running out of memory, but it's easy to see when you have.

> That drama has been happening on rather
> outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
> till 3.12.9 (3.13 is the first release for a long time which simply
> freezes the notebook so cold, that SysRq_B's not working, but that's
> another story). Everything RAM demanding just yet crawls, load average
> is getting higher and there's no paging out, but on going disk mostly
> _read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
> out, but not much, right now I ran Chromium (in addition to long-run
> Firefox) and only 32 MiB went to swap, load avg. ~ 7

that much read activity probably means that you are swapping pages in to use
them, then dropping them to swap in another page, which you then drop to go back
and fetch the first page.

David Lang

> Again: 25 % is told (by top, free and finally /proc/meminfo) to be
> cached, but kinda greedy.
>
> I came across similar issue report:
> http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
> questions remain:
>
> * How to analyze it? slabtop doesn't mention even 100 MiB of slab
> * Why that's possible?
> * The system is on Btrfs but /home is on XFS, so disk I/O might be
> related to text segment paging? But anyway this leads us to question,
> hey, there's 500 MiB free^Wcached.
>
> While I'm thinking about moving system back to XFS...
>
> P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
> to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
> Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
> letter, and cached mem. now is 439 MiB. :) I know it's time to
> upgrade, but hey, cached memory is free memory, right?
>
>

2014-01-31 14:47:47

by Igor Podlesny

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

On 31 January 2014 00:58, Igor Podlesny <[email protected]> wrote:
[...]
> While I'm thinking about moving system back to XFS...

Well, it helped just a bit. The whole picture remains, so it's not
Btrfs' issue, but seemingly Linux VM's one. The problem can be briefly
described as "if allowed to swap (swappiness != 0), VM would rather
start swapping, than reduce cache size which holds ~ 25 % of RAM".
Even more briefly it's stated in the Subject.

From user's point of view, it looks like the system is being
heavily swapped (and can be easily misinterpreted as it), but actually
the most disk activity is permanent _reading_ from filesystem, and not
accessing swap device.

Should I fill in a bug report in kernel's bugzilla, or just upgrade
the notebook? )

--
End of message. Next message?

2014-01-31 16:57:59

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: That greedy Linux VM cache



On 01/31/2014 09:47 AM, Igor Podlesny wrote:
> On 31 January 2014 00:58, Igor Podlesny <[email protected]> wrote:
> [...]
>> While I'm thinking about moving system back to XFS...
>
> Well, it helped just a bit. The whole picture remains, so it's not
> Btrfs' issue, but seemingly Linux VM's one. The problem can be briefly
> described as "if allowed to swap (swappiness != 0), VM would rather
> start swapping, than reduce cache size which holds ~ 25 % of RAM".
> Even more briefly it's stated in the Subject.
>
> From user's point of view, it looks like the system is being
> heavily swapped (and can be easily misinterpreted as it), but actually
> the most disk activity is permanent _reading_ from filesystem, and not
> accessing swap device.
>
> Should I fill in a bug report in kernel's bugzilla, or just upgrade
> the notebook? )
>
If I remember correctly, there is a sysctl for configuring how
aggressively the system tries to retain the VFS cache, changing the
value there might improve things for you.

2014-01-31 17:09:08

by Igor Podlesny

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

[...]
On 1 February 2014 00:57, Austin S. Hemmelgarn <[email protected]> wrote:
> If I remember correctly, there is a sysctl for configuring how
> aggressively the system tries to retain the VFS cache, changing the
> value there might improve things for you.

Yeah, in theory. On practice I never saw a difference even with
vm.vfs_cache_pressure set to 800000.

--
End of message. Next message?

Subject: Re: That greedy Linux VM cache

On Fri, 31 Jan 2014, Igor Podlesny wrote:
> Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so

Yeah right, we wish it would...

Anyway, maybe this helps?
http://thread.gmane.org/gmane.linux.kernel.mm/112554/focus=81834

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2014-02-03 10:55:12

by Michal Hocko

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

[Adding linux-mm to the CC]

On Fri 31-01-14 00:58:16, Igor Podlesny wrote:
> Hello!
>
> Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so
> until recently I noticed that even when free memory's is almost
> exhausted (~ 75 Mib), and processes are in sleep_on_page_killable, the

This means that the page has to be written back in order to be dropped.
How much dirty memory you have (comparing to the total size of the page
cache)?
What does your /proc/sys/vm/dirty_ratio say?
How fast is your storage?

Also, is this 32b or 64b system?

> cache is somewhat like ~ 500 MiB and it's not going to return back
> what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
> well. That drama has been happening on rather
> outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
> till 3.12.9 (3.13 is the first release for a long time which simply
> freezes the notebook so cold, that SysRq_B's not working, but that's
> another story). Everything RAM demanding just yet crawls, load average
> is getting higher and there's no paging out, but on going disk mostly
> _read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
> out, but not much, right now I ran Chromium (in addition to long-run
> Firefox) and only 32 MiB went to swap, load avg. ~ 7
>
> Again: 25 % is told (by top, free and finally /proc/meminfo) to be
> cached, but kinda greedy.
>
> I came across similar issue report:
> http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
> questions remain:
>
> * How to analyze it? slabtop doesn't mention even 100 MiB of slab

snapshoting /proc/meminfo and /proc/vmstat every second or two while
your load is bad might tell us more.

> * Why that's possible?

That is hard to tell withou some numbers. But it might be possible that
you are seeing the same issue as reported and fixed here:
http://marc.info/?l=linux-kernel&m=139060103406327&w=2

Especially when you are using tmpfs (e.g. as a backing storage for /tmp)

> * The system is on Btrfs but /home is on XFS, so disk I/O might be
> related to text segment paging? But anyway this leads us to question,
> hey, there's 500 MiB free^Wcached.
>
> While I'm thinking about moving system back to XFS...
>
> P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
> to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
> Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
> letter, and cached mem. now is 439 MiB. :) I know it's time to
> upgrade, but hey, cached memory is free memory, right?
>
> --
> End of message. Next message?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Michal Hocko
SUSE Labs

2014-02-08 19:43:24

by Igor Podlesny

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

On 3 February 2014 18:55, Michal Hocko <[email protected]> wrote:
> [Adding linux-mm to the CC]

[...]

> This means that the page has to be written back in order to be dropped.
> How much dirty memory you have (comparing to the total size of the page
> cache)?

Not too many. May be you missed that part, but I said, that disk is
being mostly READ, NOT written.
I also said, that READing is going from system partition (it was Btrfs).

> What does your /proc/sys/vm/dirty_ratio say?

10

> How fast is your storage?

Was 5400 HDD, today I installed SSD.

> Also, is this 32b or 64b system?

Kernel is x86_64 or sometimes 32, userspace is 32 -- full x86_64
setup is simply not usable on 2 GiB,
you can run just one program, like in MS-DOS era. :) (I'd give a try
to x32, but alas, it's not really ready yet.)

>> * How to analyze it? slabtop doesn't mention even 100 MiB of slab
>
> snapshoting /proc/meminfo and /proc/vmstat every second or two while
> your load is bad might tell us more.
>
>> * Why that's possible?
>
> That is hard to tell withou some numbers. But it might be possible that
> you are seeing the same issue as reported and fixed here:
> http://marc.info/?l=linux-kernel&m=139060103406327&w=2

No, there's no such amount of dirty data.

> Especially when you are using tmpfs (e.g. as a backing storage for /tmp)

I use it, yeah, but it has ridiculously low occupied space ~ 1--2 MiB.

*** Okay, so I've said I decided to try SSD. The issue stays
absolutely the same and is seen even more clearer: when swappiness is
0, Btrfs-endio is heating up processor constantly taking almost all
CPU resources (storage is fast, CPU's saturated), but when I set it
higher, thus allowing to swap, it helps -- ~ 250 MiB got swapped out
(quickly -- SSD rules) and the system became responsive again. As
previously it didn't try to reduce cache at all. I never saw it to be
even 250 MiB, always higher (~ 25 % of RAM). So, actually it's better
using swappiness = 100 in these circumstances.

I think the problem should be easily reproducible -- kernel allows
you to limit available RAM. ;)

P. S. The only thing's left as a theory is "Intel Corporation
Mobile GM965/GL960 Integrated Graphics Controller" with i915 kernel
module. I don't know much about it, but it should have had bitten a
good part of system RAM, right? Since it's Ubuntu, there's compiz by
default and pmap -d `pgrep compiz` shows lots of similar lines:

...
e0344000 20 rw-s- 0000000102e33000 000:00005 card0
e0479000 56 rw-s- 0000000102bf4000 000:00005 card0
e0487000 48 rw-s- 0000000102be8000 000:00005 card0
e0493000 56 rw-s- 0000000102bda000 000:00005 card0
e04a1000 56 rw-s- 0000000102bcc000 000:00005 card0
e04af000 48 rw-s- 0000000102bc0000 000:00005 card0
e04bb000 56 rw-s- 0000000102bb2000 000:00005 card0
e04c9000 48 rw-s- 0000000102d64000 000:00005 card0
e04d5000 192 rw-s- 0000000102ce5000 000:00005 card0
e0505000 80 rw-s- 0000000102de7000 000:00005 card0
e0519000 20 rw-s- 0000000102ccc000 000:00005 card0
e051e000 160 rw-s- 0000000102ca4000 000:00005 card0
e0546000 20 rw-s- 0000000102c9f000 000:00005 card0
e054b000 48 rw-s- 0000000102c93000 000:00005 card0
e0557000 20 rw-s- 0000000102c8e000 000:00005 card0
e055c000 20 rw-s- 0000000102c89000 000:00005 card0
...

I have a suspicion... (I also dislike the sizes of those mappings)
... that a valuable amount of that "cached memory" can be related to
this i915. How can I check it?...

--
End of message. Next message?

2014-02-10 13:33:50

by Michal Hocko

[permalink] [raw]
Subject: Re: That greedy Linux VM cache

On Sun 09-02-14 03:42:52, Igor Podlesny wrote:
> On 3 February 2014 18:55, Michal Hocko <[email protected]> wrote:
> > [Adding linux-mm to the CC]
>
> [...]
>
> > This means that the page has to be written back in order to be dropped.
> > How much dirty memory you have (comparing to the total size of the page
> > cache)?
>
> Not too many. May be you missed that part, but I said, that disk is
> being mostly READ, NOT written.
> I also said, that READing is going from system partition (it was Btrfs).
>
> > What does your /proc/sys/vm/dirty_ratio say?
>
> 10

With 2G of RAM this shouldn't be a lot. And definitely shouldn't make a
problem with SSD.

> > How fast is your storage?
>
> Was 5400 HDD, today I installed SSD.
>
> > Also, is this 32b or 64b system?
>
> Kernel is x86_64 or sometimes 32, userspace is 32 -- full x86_64
> setup is simply not usable on 2 GiB,

Which is unexpected on its own. We have many systems with comparable
and much less memory as well running just fine. You haven't posted any
numbers yet so it is still not clear where is the bottleneck on your
system.

> you can run just one program, like in MS-DOS era. :) (I'd give a try
> to x32, but alas, it's not really ready yet.)
>
> >> * How to analyze it? slabtop doesn't mention even 100 MiB of slab
> >
> > snapshoting /proc/meminfo and /proc/vmstat every second or two while
> > your load is bad might tell us more.
> >
> >> * Why that's possible?
> >
> > That is hard to tell withou some numbers. But it might be possible that
> > you are seeing the same issue as reported and fixed here:
> > http://marc.info/?l=linux-kernel&m=139060103406327&w=2
>
> No, there's no such amount of dirty data.

OK, then I would check whether this is fs related. You said that you've
tried xfs or something else with similar results?

> > Especially when you are using tmpfs (e.g. as a backing storage for /tmp)
>
> I use it, yeah, but it has ridiculously low occupied space ~ 1--2 MiB.
>
> *** Okay, so I've said I decided to try SSD. The issue stays
> absolutely the same and is seen even more clearer: when swappiness is
> 0, Btrfs-endio is heating up processor constantly taking almost all
> CPU resources (storage is fast, CPU's saturated), but when I set it
> higher, thus allowing to swap, it helps -- ~ 250 MiB got swapped out
> (quickly -- SSD rules) and the system became responsive again. As
> previously it didn't try to reduce cache at all. I never saw it to be
> even 250 MiB, always higher (~ 25 % of RAM). So, actually it's better
> using swappiness = 100 in these circumstances.

Hmm, so the swapping is fast enough while the page cache backed by the
storage is slow. I guess both the swap partition and fs are backed by
the same storage, right?
Do you have a sufficient free space on the filesystem?

> I think the problem should be easily reproducible -- kernel allows
> you to limit available RAM. ;)
>
> P. S. The only thing's left as a theory is "Intel Corporation
> Mobile GM965/GL960 Integrated Graphics Controller" with i915 kernel
> module. I don't know much about it, but it should have had bitten a
> good part of system RAM, right?

How much memory? I vaguely remember that i915 had very aggressive
reclaiming logic which led to some stalls during reclaim. I cannot seem
to find any reference right now.

Btw. Are you using vanilla kernel?

> Since it's Ubuntu, there's compiz by
> default and pmap -d `pgrep compiz` shows lots of similar lines:

It would be good to reduce problem space by disabling compiz.

> ...
> e0344000 20 rw-s- 0000000102e33000 000:00005 card0
> e0479000 56 rw-s- 0000000102bf4000 000:00005 card0
> e0487000 48 rw-s- 0000000102be8000 000:00005 card0
> e0493000 56 rw-s- 0000000102bda000 000:00005 card0
> e04a1000 56 rw-s- 0000000102bcc000 000:00005 card0
> e04af000 48 rw-s- 0000000102bc0000 000:00005 card0
> e04bb000 56 rw-s- 0000000102bb2000 000:00005 card0
> e04c9000 48 rw-s- 0000000102d64000 000:00005 card0
> e04d5000 192 rw-s- 0000000102ce5000 000:00005 card0
> e0505000 80 rw-s- 0000000102de7000 000:00005 card0
> e0519000 20 rw-s- 0000000102ccc000 000:00005 card0
> e051e000 160 rw-s- 0000000102ca4000 000:00005 card0
> e0546000 20 rw-s- 0000000102c9f000 000:00005 card0
> e054b000 48 rw-s- 0000000102c93000 000:00005 card0
> e0557000 20 rw-s- 0000000102c8e000 000:00005 card0
> e055c000 20 rw-s- 0000000102c89000 000:00005 card0
> ...
>
> I have a suspicion... (I also dislike the sizes of those mappings)

The mappings do not seem to be too big (the biggest one has 160kB)...

> ... that a valuable amount of that "cached memory" can be related to
> this i915. How can I check it?...

I am not sure I understand what you are asking about.
--
Michal Hocko
SUSE Labs