Hi,
commit 56e49d - "vmscan: evict use-once pages first" changed behavior of
memory management quite a bit which should be fine.
But while tracking down a performance regression I was on the wrong path
for a while suspecting this patch is causing the regression.
Fortunately this was not the case, but I got some interesting data which
I couldn't explain completely and I thought maybe its worth to get it
clarified publicly in case someone else looks at similar data again :-)
All is about the increased amount of "Buffers" accounted as active while
loosing the same portion from "Cache" accounted as inactive in
/proc/meminfo.
I understand that with the patch applied there will be some more
pressure to file pages until the balance of active/inactive file pages
is reached.
But I didn't get how this prefers buffers compared to cache pages (I
assume dropping inactive before active was the case all the time so that
can't be the only difference between buffers/cache).
The scenario I'm running is a low memory system (256M total), that does
sequential I/O with parallel iozone processes.
One process per disk, each process reading a 2Gb file. The issue occurs
independent type of disks I use. File system is ext2.
While bisecting even 4 parallel reads from 2Gb files in /tmp were enough
to see a different amount of buffers in /proc/meminfo.
Looking at the data I got from /proc/meminfo (only significant changes):
before with 56e49d large devs
MemTotal: 250136 kB 250136 kB
MemFree: 6760 kB 6608 kB
Buffers: 2324 kB 34960 kB +32636
Cached: 84296 kB 45860 kB -38436
SwapCached: 392 kB 1416 kB
Active: 6292 kB 38388 kB +32096
Inactive: 89360 kB 51232 kB -38128
Active(anon): 4004 kB 3496 kB
Inactive(anon): 8824 kB 9164 kB
Active(file): 2288 kB 34892 kB +32604
Inactive(file): 80536 kB 42068 kB -38468
Slab: 106624 kB 112364 kB +5740
SReclaimable: 5856 kB 11860 kB +6004
[...]
From slabinfo I know that the slab increase is just secondary due to
more structures to e.g. organize the buffers (buffer_head).
I would understand if file associated memory would now shrink in favor
of non file memory after this patch.
But I can't really see in the code where buffers are favored in
comparison to cached pages - (it very probably makes sense to do so, as
they might contain e.g. the inode data about the files in cache).
I think an explanation how that works might be useful for more people
than just me, so comments welcome.
Kind regards,
Christian
--
Gr?sse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
On 12/07/2009 09:36 AM, Christian Ehrhardt wrote:
> Hi,
> commit 56e49d - "vmscan: evict use-once pages first" changed behavior of
> memory management quite a bit which should be fine.
> But while tracking down a performance regression I was on the wrong path
> for a while suspecting this patch is causing the regression.
> Fortunately this was not the case, but I got some interesting data which
> I couldn't explain completely and I thought maybe its worth to get it
> clarified publicly in case someone else looks at similar data again :-)
>
> All is about the increased amount of "Buffers" accounted as active while
> loosing the same portion from "Cache" accounted as inactive in
> /proc/meminfo.
> I understand that with the patch applied there will be some more
> pressure to file pages until the balance of active/inactive file pages
> is reached.
> But I didn't get how this prefers buffers compared to cache pages (I
> assume dropping inactive before active was the case all the time so that
> can't be the only difference between buffers/cache).
Well, "Buffers" is the same kind of memory as "Cached", with
the only difference being that "Cached" is associated with
files, while "Buffers" is associated with a block device.
This means that "Buffers" is more likely to contain filesystem
metadata, while "Cached" is more likely to contain file data.
Not putting pressure on the active file list if there are a
large number of inactive file pages means that pages which were
accessed more than once get protected more from pages that were
only accessed once.
My guess is that "Buffers" is larger because the VM now caches
more (frequently used) filesystem metadata, at the expense of
caching less (used once) file data.
> The scenario I'm running is a low memory system (256M total), that does
> sequential I/O with parallel iozone processes.
This indeed sounds like the kind of workload that would only
access the file data very infrequently, while accessing the
filesystem metadata all the time.
> But I can't really see in the code where buffers are favored in
> comparison to cached pages - (it very probably makes sense to do so, as
> they might contain e.g. the inode data about the files in cache).
You are right that the code does not favor Buffers or Cache
over the other, but treats both kinds of pages the same.
I believe that you are just seeing the effect of code that
better protects the frequently accessed metadata from the
infrequently accessed data.
--
All rights reversed.
> On 12/07/2009 09:36 AM, Christian Ehrhardt wrote:
> > Hi,
> > commit 56e49d - "vmscan: evict use-once pages first" changed behavior of
> > memory management quite a bit which should be fine.
> > But while tracking down a performance regression I was on the wrong path
> > for a while suspecting this patch is causing the regression.
> > Fortunately this was not the case, but I got some interesting data which
> > I couldn't explain completely and I thought maybe its worth to get it
> > clarified publicly in case someone else looks at similar data again :-)
> >
> > All is about the increased amount of "Buffers" accounted as active while
> > loosing the same portion from "Cache" accounted as inactive in
> > /proc/meminfo.
> > I understand that with the patch applied there will be some more
> > pressure to file pages until the balance of active/inactive file pages
> > is reached.
> > But I didn't get how this prefers buffers compared to cache pages (I
> > assume dropping inactive before active was the case all the time so that
> > can't be the only difference between buffers/cache).
>
> Well, "Buffers" is the same kind of memory as "Cached", with
> the only difference being that "Cached" is associated with
> files, while "Buffers" is associated with a block device.
>
> This means that "Buffers" is more likely to contain filesystem
> metadata, while "Cached" is more likely to contain file data.
>
> Not putting pressure on the active file list if there are a
> large number of inactive file pages means that pages which were
> accessed more than once get protected more from pages that were
> only accessed once.
>
> My guess is that "Buffers" is larger because the VM now caches
> more (frequently used) filesystem metadata, at the expense of
> caching less (used once) file data.
>
> > The scenario I'm running is a low memory system (256M total), that does
> > sequential I/O with parallel iozone processes.
>
> This indeed sounds like the kind of workload that would only
> access the file data very infrequently, while accessing the
> filesystem metadata all the time.
>
> > But I can't really see in the code where buffers are favored in
> > comparison to cached pages - (it very probably makes sense to do so, as
> > they might contain e.g. the inode data about the files in cache).
>
> You are right that the code does not favor Buffers or Cache
> over the other, but treats both kinds of pages the same.
>
> I believe that you are just seeing the effect of code that
> better protects the frequently accessed metadata from the
> infrequently accessed data.
I try to explain the same thing as another word. if active list have
lots unimportant pages, the patch makes to gurard unimportant pages.
it might makes stream I/O benchmark score a bit because such workload
doesn't have the pages theat should be protected. iow, it only reduce
memory for cache.
The patch's intention is to improve real workload (i.e. stream/random I/O mixed workload).
not improve benchmark score. So, I'm interest how much decrease your
benchmark score.
KOSAKI Motohiro wrote:
>> On 12/07/2009 09:36 AM, Christian Ehrhardt wrote:
>>
>>> Hi,
>>> commit 56e49d - "vmscan: evict use-once pages first" changed behavior of
>>> memory management quite a bit which should be fine.
>>> But while tracking down a performance regression I was on the wrong path
>>> for a while suspecting this patch is causing the regression.
>>> Fortunately this was not the case, but I got some interesting data which
>>> I couldn't explain completely and I thought maybe its worth to get it
>>> clarified publicly in case someone else looks at similar data again :-)
>>>
>>> All is about the increased amount of "Buffers" accounted as active while
>>> loosing the same portion from "Cache" accounted as inactive in
>>> /proc/meminfo.
>>> I understand that with the patch applied there will be some more
>>> pressure to file pages until the balance of active/inactive file pages
>>> is reached.
>>> But I didn't get how this prefers buffers compared to cache pages (I
>>> assume dropping inactive before active was the case all the time so that
>>> can't be the only difference between buffers/cache).
>>>
>> Well, "Buffers" is the same kind of memory as "Cached", with
>> the only difference being that "Cached" is associated with
>> files, while "Buffers" is associated with a block device.
>>
>> This means that "Buffers" is more likely to contain filesystem
>> metadata, while "Cached" is more likely to contain file data.
>>
>> Not putting pressure on the active file list if there are a
>> large number of inactive file pages means that pages which were
>> accessed more than once get protected more from pages that were
>> only accessed once.
>>
>> My guess is that "Buffers" is larger because the VM now caches
>> more (frequently used) filesystem metadata, at the expense of
>> caching less (used once) file data.
>>
>>
>>> The scenario I'm running is a low memory system (256M total), that does
>>> sequential I/O with parallel iozone processes.
>>>
>> This indeed sounds like the kind of workload that would only
>> access the file data very infrequently, while accessing the
>> filesystem metadata all the time.
>>
>>
>>> But I can't really see in the code where buffers are favored in
>>> comparison to cached pages - (it very probably makes sense to do so, as
>>> they might contain e.g. the inode data about the files in cache).
>>>
>> You are right that the code does not favor Buffers or Cache
>> over the other, but treats both kinds of pages the same.
>>
>> I believe that you are just seeing the effect of code that
>> better protects the frequently accessed metadata from the
>> infrequently accessed data.
>>
>
> I try to explain the same thing as another word. if active list have
> lots unimportant pages, the patch makes to gurard unimportant pages.
> it might makes stream I/O benchmark score a bit because such workload
> doesn't have the pages theat should be protected. iow, it only reduce
> memory for cache.
>
> The patch's intention is to improve real workload (i.e. stream/random I/O mixed workload).
> not improve benchmark score. So, I'm interest how much decrease your
> benchmark score.
>
As mentioned initially it doesn't have any benchmark score impact at all
(neither positive nor negative). I expect it might be beneficial for
scores in e.g. reread scenarios.
It was just wondering about the buffers vs cached pages preference,
which as I stated and also Rik confirmed is meta data and therefore wise
to keep in comparison to less used data.
btw thanks for the explanation Rik, the file/blockdev association was
exactly what I was missing in my thoughts.
While my question was more intended to ask where in code these
differentiation is made I'm perfectly fine with having it just working
knowing that file/blockdev association is the key.
--
GrĂ¼sse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
On 12/08/2009 10:54 AM, Christian Ehrhardt wrote:
> btw thanks for the explanation Rik, the file/blockdev association was
> exactly what I was missing in my thoughts.
> While my question was more intended to ask where in code these
> differentiation is made I'm perfectly fine with having it just working
> knowing that file/blockdev association is the key.
Actually, the file/blockdev association is just a coincidence,
due to the way your benchmark works.
The key is "page touched once" vs "page touched multiple times".
In eg. a database workload, I would expect much more file data
to be on the active list. Specifically the file data corresponding
to the database indexes.
--
All rights reversed.