2009-06-22 21:25:46

by Alok Kataria

[permalink] [raw]
Subject: [PATCH] Hugepages should be accounted as unevictable pages.

Looking at the output of /proc/meminfo, a user might get confused in thinking
that there are zero unevictable pages, though, in reality their can be
hugepages which are inherently unevictable.

Though hugepages are not handled by the unevictable lru framework, they are
infact unevictable in nature and global statistics counter should reflect that.

For instance, I have allocated 20 huge pages on my system, meminfo shows this

Unevictable: 0 kB
Mlocked: 0 kB
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 0
HugePages_Surp: 0

After the patch:

Unevictable: 81920 kB
Mlocked: 0 kB
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 0
HugePages_Surp: 0

Signed-off-by: Alok N Kataria <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>

Index: linux-2.6/Documentation/vm/unevictable-lru.txt
===================================================================
--- linux-2.6.orig/Documentation/vm/unevictable-lru.txt 2009-06-22 11:49:27.000000000 -0700
+++ linux-2.6/Documentation/vm/unevictable-lru.txt 2009-06-22 13:57:32.000000000 -0700
@@ -71,6 +71,12 @@ The unevictable list addresses the follo

(*) Those mapped into VM_LOCKED [mlock()ed] VMAs.

+ (*) Hugetlb pages are also unevictable. Hugepages are already implemented in
+ a way that these pages don't reside on the LRU and hence are not iterated
+ over during the vmscan. So there is no need to move around these pages
+ across different LRU's. We just account these pages as unevictable for
+ correct statistics.
+
The infrastructure may also be able to handle other conditions that make pages
unevictable, either by definition or by circumstance, in the future.

Index: linux-2.6/mm/hugetlb.c
===================================================================
--- linux-2.6.orig/mm/hugetlb.c 2009-06-22 11:49:57.000000000 -0700
+++ linux-2.6/mm/hugetlb.c 2009-06-22 14:04:05.000000000 -0700
@@ -533,6 +533,8 @@ static void update_and_free_page(struct
1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
1 << PG_private | 1<< PG_writeback);
}
+ mod_zone_page_state(page_zone(page), NR_LRU_BASE + LRU_UNEVICTABLE,
+ -(pages_per_huge_page(h)));
set_compound_page_dtor(page, NULL);
set_page_refcounted(page);
arch_release_hugepage(page);
@@ -584,6 +586,8 @@ static void prep_new_huge_page(struct hs
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ mod_zone_page_state(page_zone(page), NR_LRU_BASE + LRU_UNEVICTABLE,
+ pages_per_huge_page(h));
spin_unlock(&hugetlb_lock);
put_page(page); /* free it into the hugepage allocator */
}
@@ -749,6 +753,9 @@ static struct page *alloc_buddy_huge_pag
*/
h->nr_huge_pages_node[nid]++;
h->surplus_huge_pages_node[nid]++;
+ mod_zone_page_state(page_zone(page),
+ NR_LRU_BASE + LRU_UNEVICTABLE,
+ pages_per_huge_page(h));
__count_vm_event(HTLB_BUDDY_PGALLOC);
} else {
h->nr_huge_pages--;


2009-06-23 03:25:55

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

> Looking at the output of /proc/meminfo, a user might get confused in thinking
> that there are zero unevictable pages, though, in reality their can be
> hugepages which are inherently unevictable.
>
> Though hugepages are not handled by the unevictable lru framework, they are
> infact unevictable in nature and global statistics counter should reflect that.
>
> For instance, I have allocated 20 huge pages on my system, meminfo shows this
>
> Unevictable: 0 kB
> Mlocked: 0 kB
> HugePages_Total: 20
> HugePages_Free: 20
> HugePages_Rsvd: 0
> HugePages_Surp: 0
>
> After the patch:
>
> Unevictable: 81920 kB
> Mlocked: 0 kB
> HugePages_Total: 20
> HugePages_Free: 20
> HugePages_Rsvd: 0
> HugePages_Surp: 0

At first, We should clarify the spec of unevictable.
Currently, Unevictable field mean the number of pages in unevictable-lru
and hugepage never insert any lru.

I think this patch will change this rule.


2009-06-23 04:46:56

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Mon, 2009-06-22 at 20:25 -0700, KOSAKI Motohiro wrote:
> > Looking at the output of /proc/meminfo, a user might get confused in thinking
> > that there are zero unevictable pages, though, in reality their can be
> > hugepages which are inherently unevictable.
> >
> > Though hugepages are not handled by the unevictable lru framework, they are
> > infact unevictable in nature and global statistics counter should reflect that.
> >
> > For instance, I have allocated 20 huge pages on my system, meminfo shows this
> >
> > Unevictable: 0 kB
> > Mlocked: 0 kB
> > HugePages_Total: 20
> > HugePages_Free: 20
> > HugePages_Rsvd: 0
> > HugePages_Surp: 0
> >
> > After the patch:
> >
> > Unevictable: 81920 kB
> > Mlocked: 0 kB
> > HugePages_Total: 20
> > HugePages_Free: 20
> > HugePages_Rsvd: 0
> > HugePages_Surp: 0
>
> At first, We should clarify the spec of unevictable.
> Currently, Unevictable field mean the number of pages in unevictable-lru
> and hugepage never insert any lru.
>
> I think this patch will change this rule.

I agree, and that's why I added a comment to the documentation file to
that effect. If you think its not explicit or doesn't explain what its
supposed to we can add something more there.

IMO, the proc output should give the total number of unevictable pages
in the system and, since hugepages are also in fact unevictable so I
don't see a reason why they shouldn't be accounted accordingly.
What do you think ?

Thanks,
Alok
>
>
>

2009-06-23 05:05:57

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

> > > Unevictable: 0 kB
> > > Mlocked: 0 kB
> > > HugePages_Total: 20
> > > HugePages_Free: 20
> > > HugePages_Rsvd: 0
> > > HugePages_Surp: 0
> > >
> > > After the patch:
> > >
> > > Unevictable: 81920 kB
> > > Mlocked: 0 kB
> > > HugePages_Total: 20
> > > HugePages_Free: 20
> > > HugePages_Rsvd: 0
> > > HugePages_Surp: 0
> >
> > At first, We should clarify the spec of unevictable.
> > Currently, Unevictable field mean the number of pages in unevictable-lru
> > and hugepage never insert any lru.
> >
> > I think this patch will change this rule.
>
> I agree, and that's why I added a comment to the documentation file to
> that effect. If you think its not explicit or doesn't explain what its
> supposed to we can add something more there.
>
> IMO, the proc output should give the total number of unevictable pages
> in the system and, since hugepages are also in fact unevictable so I
> don't see a reason why they shouldn't be accounted accordingly.
> What do you think ?

ummm...

I'm not sure this unevictable definition is good idea or not. currently
hugepage isn't only non-account memory, but also various kernel memory doesn't
account.

one of drawback is that zone_page_state(UNEVICTABLE) lost to mean #-of-unevictable-pages.
e.g. following patch is wrong?

fs/proc/meminfo.c meminfo_proc_show()
----------------------------
- K(pages[LRU_UNEVICTABLE]),
+ K(pages[LRU_UNEVICTABLE]) + hstate->nr_huge_pages,


Plus, I didn't find any practical benefit in this patch. do you have it?
or You only want to natural definition?

I don't have any strong oppose reason, but I also don't have any strong
agree reason.


Lee, What do you think?



2009-06-23 05:13:31

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 23 Jun 2009 14:05:47 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> I'm not sure this unevictable definition is good idea or not. currently
> hugepage isn't only non-account memory, but also various kernel memory doesn't
> account.
>
> one of drawback is that zone_page_state(UNEVICTABLE) lost to mean #-of-unevictable-pages.
> e.g. following patch is wrong?
>
> fs/proc/meminfo.c meminfo_proc_show()
> ----------------------------
> - K(pages[LRU_UNEVICTABLE]),
> + K(pages[LRU_UNEVICTABLE]) + hstate->nr_huge_pages,
>
>
> Plus, I didn't find any practical benefit in this patch. do you have it?
> or You only want to natural definition?
>
> I don't have any strong oppose reason, but I also don't have any strong
> agree reason.
>
I think "don't include Hugepage" is sane. Hugepage is something _special_, now.

Thanks,
-Kame

2009-06-23 05:54:18

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Mon, 2009-06-22 at 22:11 -0700, KAMEZAWA Hiroyuki wrote:
> On Tue, 23 Jun 2009 14:05:47 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
> > I'm not sure this unevictable definition is good idea or not. currently
> > hugepage isn't only non-account memory, but also various kernel memory doesn't
> > account.
> >
> > one of drawback is that zone_page_state(UNEVICTABLE) lost to mean #-of-unevictable-pages.
Kosaki-san,
I don't see the reason, why is it important to have the count of number
of pages on unevictable-lru.
Instead zone_page_state(UNEVICTABLE) now correctly tells how many of
these pages from this zone are actually unevictable.

> > e.g. following patch is wrong?
> >
> > fs/proc/meminfo.c meminfo_proc_show()
> > ----------------------------
> > - K(pages[LRU_UNEVICTABLE]),
> > + K(pages[LRU_UNEVICTABLE]) + hstate->nr_huge_pages,
> >
> >
> > Plus, I didn't find any practical benefit in this patch. do you have it?
> > or You only want to natural definition?

Both, while working on an module I noticed that there is no way direct
way to get any information regarding the total number of unrecliamable
(unevictable) pages in the system. While reading through the kernel
sources i came across this unevictalbe LRU framework and thought that
this should actually work towards providing total unevictalbe pages in
the system irrespective of where they reside.

So both there is a need as well as, (IMO) this should be the natural
definition for unevictable pages.

> >
> > I don't have any strong oppose reason, but I also don't have any strong
> > agree reason.
> >
> I think "don't include Hugepage" is sane. Hugepage is something _special_, now.
>
Kamezawa-san,

I agree that hugepages are special in the sense that they are
implemented specially and don't actually reside on the LRU like any
other locked memory. But, both of these memory types (mlocked and
hugepages) are actually unevictable and can't be reclaimed back, so i
don't see a reason why should accounting not reflect that.

Thanks,
Alok

> Thanks,
> -Kame
>

2009-06-23 06:08:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Mon, 22 Jun 2009 22:54:01 -0700
Alok Kataria <[email protected]> wrote:

> > >
> > > I don't have any strong oppose reason, but I also don't have any strong
> > > agree reason.
> > >
> > I think "don't include Hugepage" is sane. Hugepage is something _special_, now.
> >
> Kamezawa-san,
>
> I agree that hugepages are special in the sense that they are
> implemented specially and don't actually reside on the LRU like any
> other locked memory. But, both of these memory types (mlocked and
> hugepages) are actually unevictable and can't be reclaimed back, so i
> don't see a reason why should accounting not reflect that.
>

I bet we should rename "Unevictable" to "Mlocked" or "Pinned" rather than
take nr_hugepages into account. I think this "Unevictable" in meminfo means
- pages which are evictable in their nature (because in LRU) but a user pinned it -

How about rename "Unevictable" to "Pinned" or "Locked" ?
(Mlocked + locked shmem's + ramfs?)

We have other "unevictable" pages other than Hugepage anyway.
- page table
- some slab
- kernel's page
- anon pages in swapless system
etc...

BTW, I use following calculation for quick check if I want all "Unevicatable" pages.

Unevictable = Total - (Active+Inactive) + (50-70%? of slab)

This # of is not-reclaimable memory.

Thanks,
-Kame


> Thanks,
> Alok
>
> > Thanks,
> > -Kame
> >
>
>

2009-06-23 12:26:17

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 14:05 +0900, KOSAKI Motohiro wrote:
> > > > Unevictable: 0 kB
> > > > Mlocked: 0 kB
> > > > HugePages_Total: 20
> > > > HugePages_Free: 20
> > > > HugePages_Rsvd: 0
> > > > HugePages_Surp: 0
> > > >
> > > > After the patch:
> > > >
> > > > Unevictable: 81920 kB
> > > > Mlocked: 0 kB
> > > > HugePages_Total: 20
> > > > HugePages_Free: 20
> > > > HugePages_Rsvd: 0
> > > > HugePages_Surp: 0
> > >
> > > At first, We should clarify the spec of unevictable.
> > > Currently, Unevictable field mean the number of pages in unevictable-lru
> > > and hugepage never insert any lru.
> > >
> > > I think this patch will change this rule.
> >
> > I agree, and that's why I added a comment to the documentation file to
> > that effect. If you think its not explicit or doesn't explain what its
> > supposed to we can add something more there.
> >
> > IMO, the proc output should give the total number of unevictable pages
> > in the system and, since hugepages are also in fact unevictable so I
> > don't see a reason why they shouldn't be accounted accordingly.
> > What do you think ?
>
> ummm...
>
> I'm not sure this unevictable definition is good idea or not. currently
> hugepage isn't only non-account memory, but also various kernel memory doesn't
> account.
>
> one of drawback is that zone_page_state(UNEVICTABLE) lost to mean #-of-unevictable-pages.
> e.g. following patch is wrong?
>
> fs/proc/meminfo.c meminfo_proc_show()
> ----------------------------
> - K(pages[LRU_UNEVICTABLE]),
> + K(pages[LRU_UNEVICTABLE]) + hstate->nr_huge_pages,
>
>
> Plus, I didn't find any practical benefit in this patch. do you have it?
> or You only want to natural definition?
>
> I don't have any strong oppose reason, but I also don't have any strong
> agree reason.
>
>
> Lee, What do you think?
>

Alok asked me about this off-list. Like you, I have no strong feelings
either way. Before this patch, yes, the Unevictable meminfo item does
correspond to the number of pages on the unevictable lru. However, I
don't know that this is all that useful to an administrator. And, I
don't think we depend on this count anywhere in the code. So, perhaps
having the system "do the math" to add the unevictable huge pages to
this item is more useful. Then, again, as you point out, there is a lot
of kernel memory that is also unevictable that would not be accounted
here.

Lee

2009-06-23 19:29:00

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Mon, 2009-06-22 at 23:06 -0700, KAMEZAWA Hiroyuki wrote:
> On Mon, 22 Jun 2009 22:54:01 -0700
> Alok Kataria <[email protected]> wrote:
>
> > > >
> > > > I don't have any strong oppose reason, but I also don't have any strong
> > > > agree reason.
> > > >
> > > I think "don't include Hugepage" is sane. Hugepage is something _special_, now.
> > >
> > Kamezawa-san,
> >
> > I agree that hugepages are special in the sense that they are
> > implemented specially and don't actually reside on the LRU like any
> > other locked memory. But, both of these memory types (mlocked and
> > hugepages) are actually unevictable and can't be reclaimed back, so i
> > don't see a reason why should accounting not reflect that.
> >
>
> I bet we should rename "Unevictable" to "Mlocked" or "Pinned" rather than
> take nr_hugepages into account. I think this "Unevictable" in meminfo means
> - pages which are evictable in their nature (because in LRU) but a user pinned it -
>
> How about rename "Unevictable" to "Pinned" or "Locked" ?
> (Mlocked + locked shmem's + ramfs?)
>

As Lee also pointed out, i don't see why is this # of pages on
unevictable_lru important for the user.
IMO, it doesn't give any useful information, other than confusing us to
believe that only these are unevictable.

Is there something else that I am missing here ?

> We have other "unevictable" pages other than Hugepage anyway.
> - page table
> - some slab
> - kernel's page
> - anon pages in swapless system
> etc...

I agree there are these other pages which are unevictable, but they are
pages used by the kernel itself, and they will always be locked/utilized
by the kernel.
The unevictable pages (hugepages and mlocked and others) on the other
hand are pages which the user explicitly asked to be locked/pinned.

So i think, these other-evictable pages that you mentioned, are
different in a way.

>
> BTW, I use following calculation for quick check if I want all "Unevicatable" pages.
>
> Unevictable = Total - (Active+Inactive) + (50-70%? of slab)
>
> This # of is not-reclaimable memory.

I don't see how this would get the correct value either, mlocked or
hugepages are not accounted by either of the Active or Inactive regions.


Thanks,
Alok

>
> Thanks,
> -Kame
>
>
> > Thanks,
> > Alok
> >
> > > Thanks,
> > > -Kame
> > >
> >
> >
>

2009-06-23 20:30:34

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 12:28 -0700, Alok Kataria wrote:
> On Mon, 2009-06-22 at 23:06 -0700, KAMEZAWA Hiroyuki wrote:
> > On Mon, 22 Jun 2009 22:54:01 -0700
> > Alok Kataria <[email protected]> wrote:
> >
> > > > >
> > > > > I don't have any strong oppose reason, but I also don't have any strong
> > > > > agree reason.
> > > > >
> > > > I think "don't include Hugepage" is sane. Hugepage is something _special_, now.
> > > >
> > > Kamezawa-san,
> > >
> > > I agree that hugepages are special in the sense that they are
> > > implemented specially and don't actually reside on the LRU like any
> > > other locked memory. But, both of these memory types (mlocked and
> > > hugepages) are actually unevictable and can't be reclaimed back, so i
> > > don't see a reason why should accounting not reflect that.
> > >
> >
> > I bet we should rename "Unevictable" to "Mlocked" or "Pinned" rather than
> > take nr_hugepages into account. I think this "Unevictable" in meminfo means
> > - pages which are evictable in their nature (because in LRU) but a user pinned it -
> >
> > How about rename "Unevictable" to "Pinned" or "Locked" ?
> > (Mlocked + locked shmem's + ramfs?)
> >
>
> As Lee also pointed out, i don't see why is this # of pages on
> unevictable_lru important for the user.
> IMO, it doesn't give any useful information, other than confusing us to
> believe that only these are unevictable.

Ah. I meant to respond to Kame-san's mail this am, but got distracted.

Please note that "Unevictable" includes more than mlocked pages. It
also includes SHM_LOCKED pages and ramfs pages. So if one were to
rename it, I'd prefer "Pinned" to "Mlocked".

Also, it occurs to me that because of lazy culling of unevictable pages
of any type, the "Unevictable" stat does not necessarily correspond to
the actual number of unevictable pages of any order. Some unevictable
pages will not be noticed until vmscan actually tries to reclaim them.
So, even discounting kernel/sl*b pages, and even with Alok's patch,
"Unevictable" may not show all unevictable memory. Under memory
pressure, it should be close, tho', modulo kernel/sl*b pages.

>
> Is there something else that I am missing here ?

Probably just a matter of preference. As the code currently stands, a
user/admin would need to add <hugepage-size-in-KB> * nr_hugepages to the
"Unevictable" to get non-kernel unevictable memory. With your patch, a
developer wanting to know the amount of memory on the unevictable LRU
[for whatever reason], would need to do the math. Putting on my
"Fraternal Order of the Friends of Users" hat, I can see the benefit of
your approach. But, still no strong feelings either way as long as the
meanings of the fields are documented somewhere and sufficient
information exists for the needs of users and developers.

>
> > We have other "unevictable" pages other than Hugepage anyway.
> > - page table
> > - some slab
> > - kernel's page
> > - anon pages in swapless system
> > etc...
>
> I agree there are these other pages which are unevictable, but they are
> pages used by the kernel itself, and they will always be locked/utilized
> by the kernel.
> The unevictable pages (hugepages and mlocked and others) on the other
> hand are pages which the user explicitly asked to be locked/pinned.
>
> So i think, these other-evictable pages that you mentioned, are
> different in a way.
>
> >
> > BTW, I use following calculation for quick check if I want all "Unevicatable" pages.
> >
> > Unevictable = Total - (Active+Inactive) + (50-70%? of slab)
> >
> > This # of is not-reclaimable memory.
>
> I don't see how this would get the correct value either, mlocked or
> hugepages are not accounted by either of the Active or Inactive regions.
>
>
> Thanks,
> Alok
>
> >
> > Thanks,
> > -Kame
> >
> >
> > > Thanks,
> > > Alok
> > >
> > > > Thanks,
> > > > -Kame
> > > >
> > >
> > >
> >
>

2009-06-23 21:25:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

Alok Kataria wrote:

> Both, while working on an module I noticed that there is no way direct
> way to get any information regarding the total number of unrecliamable
> (unevictable) pages in the system. While reading through the kernel
> sources i came across this unevictalbe LRU framework and thought that
> this should actually work towards providing total unevictalbe pages in
> the system irrespective of where they reside.

The unevictable count tells you how many _userspace_
pages are not evictable.

There are countless accounted and unaccounted kernel
allocations that show up (or not) in other fields in
/proc/meminfo.

I can see something reasonable on both sides of this
particular debate. However, even with this patch the
"unevictable" statistic does not reclaim the total
number of pages that are unevictable pages from a
zone, so I am not sure how it helps you achieve your
goal.

--
All rights reversed.

2009-06-23 21:42:27

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Tue, 2009-06-23 at 14:24 -0700, Rik van Riel wrote:
> Alok Kataria wrote:
>
> > Both, while working on an module I noticed that there is no way direct
> > way to get any information regarding the total number of unrecliamable
> > (unevictable) pages in the system. While reading through the kernel
> > sources i came across this unevictalbe LRU framework and thought that
> > this should actually work towards providing total unevictalbe pages in
> > the system irrespective of where they reside.
>
> The unevictable count tells you how many _userspace_
> pages are not evictable.
>
> There are countless accounted and unaccounted kernel
> allocations that show up (or not) in other fields in
> /proc/meminfo.
>

> I can see something reasonable on both sides of this
> particular debate. However, even with this patch the
> "unevictable" statistic does not reclaim the total
> number of pages that are unevictable pages from a
> zone, so I am not sure how it helps you achieve your
> goal.
>

Yes but most of the other memory (page table and others) which is
unevictable is actually static in nature. IOW, the amount of this other
kind of kernel unevictable pages can be actually interpolated from the
amount of physical memory on the system.

One thing that i forgot to mention earlier is that, I just need a way to
provide a hint about the total locked memory on the system and it
doesn't need to be the exact number at that point in time.

Lee, due to this reason lazy culling of unevictable pages is fine too.

Hugepages, similar to mlocked pages, are special because the user could
specify how much memory it wants to reserve for this purpose. So that
needs to be taken into consideration i.e it cannot be calculated in some
way.

Thanks,
Alok

2009-06-23 21:56:26

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

Alok Kataria wrote:
> On Tue, 2009-06-23 at 14:24 -0700, Rik van Riel wrote:

>> I can see something reasonable on both sides of this
>> particular debate. However, even with this patch the
>> "unevictable" statistic does not reclaim the total
>> number of pages that are unevictable pages from a
>> zone, so I am not sure how it helps you achieve your
>> goal.
>
> Yes but most of the other memory (page table and others) which is
> unevictable is actually static in nature. IOW, the amount of this other
> kind of kernel unevictable pages can be actually interpolated from the
> amount of physical memory on the system.

That would be a fair argument, if it were true.

Things like page tables and dentry/inode caches vary
according to the use case and are allocated as needed.
They are in no way "static in nature".

--
All rights reversed.

2009-06-23 22:06:58

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Tue, 2009-06-23 at 14:55 -0700, Rik van Riel wrote:
> Alok Kataria wrote:
> > On Tue, 2009-06-23 at 14:24 -0700, Rik van Riel wrote:
>
> >> I can see something reasonable on both sides of this
> >> particular debate. However, even with this patch the
> >> "unevictable" statistic does not reclaim the total
> >> number of pages that are unevictable pages from a
> >> zone, so I am not sure how it helps you achieve your
> >> goal.
> >
> > Yes but most of the other memory (page table and others) which is
> > unevictable is actually static in nature. IOW, the amount of this other
> > kind of kernel unevictable pages can be actually interpolated from the
> > amount of physical memory on the system.
>
> That would be a fair argument, if it were true.
>
> Things like page tables and dentry/inode caches vary
> according to the use case and are allocated as needed.
> They are in no way "static in nature".
>

Maybe static was the wrong word to use here.
What i meant was that you could always calculate the *maximum* amount of
memory that is going to be used by page table and can also determine the
% of memory that will be used by slab caches.
So that ways you should be statically able to tell that no more than 'X'
amount of memory is going to be locked here.
Will again like to stress that "X" is not the exact amount that is
locked here but the one which can be.

OTOH, for hugepages and mlocked pages you need to read the exact counts
as this can change according to user selection.

Thanks,
Alok

2009-06-23 22:16:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 14:42 -0700, Alok Kataria wrote:
> One thing that i forgot to mention earlier is that, I just need a way to
> provide a hint about the total locked memory on the system and it
> doesn't need to be the exact number at that point in time.
>
> Lee, due to this reason lazy culling of unevictable pages is fine too.
>
> Hugepages, similar to mlocked pages, are special because the user could
> specify how much memory it wants to reserve for this purpose. So that
> needs to be taken into consideration i.e it cannot be calculated in some
> way.

Could you just teach the thing to which you are hinting that it also
needs to go look in sysfs for huge page counts? Or, is there a
requirement that it come out of a single meminfo field?

-- Dave

2009-06-23 22:20:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 15:06 -0700, Alok Kataria wrote:
> What i meant was that you could always calculate the *maximum* amount of
> memory that is going to be used by page table and can also determine the
> % of memory that will be used by slab caches.

I'm not really sure what you mean by this. I actually just wrote a
little userspace program the other day that fills virtually all of ram
in with pagetables. There's no real upper bound on them, unless you're
restricting the amount of mapped space that all the userspace processes
determine. In the same way, the right usage pattern can give virtually
all the RAM in the system and put it in a single or set of slabs. It's
probably evictable most of the time, but sometimes large amounts can be
pinned.

> So that ways you should be statically able to tell that no more than 'X'
> amount of memory is going to be locked here.
> Will again like to stress that "X" is not the exact amount that is
> locked here but the one which can be.
>
> OTOH, for hugepages and mlocked pages you need to read the exact counts
> as this can change according to user selection.

I'm a bit lost. Could we take a step back here and talk about what
you're trying to do in the first place?

-- Dave

2009-06-23 22:23:48

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Tue, 2009-06-23 at 15:15 -0700, Dave Hansen wrote:
> On Tue, 2009-06-23 at 14:42 -0700, Alok Kataria wrote:
> > One thing that i forgot to mention earlier is that, I just need a way to
> > provide a hint about the total locked memory on the system and it
> > doesn't need to be the exact number at that point in time.
> >
> > Lee, due to this reason lazy culling of unevictable pages is fine too.
> >
> > Hugepages, similar to mlocked pages, are special because the user could
> > specify how much memory it wants to reserve for this purpose. So that
> > needs to be taken into consideration i.e it cannot be calculated in some
> > way.
>
> Could you just teach the thing to which you are hinting that it also
> needs to go look in sysfs for huge page counts?

:) yeah i could do that too...the point is that its a module and the
function to get the hugepages count is not exported right now. I could
very well add this as an exported symbol and use it from there, but
there can be someone who doesn't want symbols to be unnecessarily
exported if their is no in-tree modular usage of that symbol.

Other than that it also doesn't quite sound right that I have to query
the kernel for different variables when unevictable should get me all of
user specified locked usage.

Thanks,
Alok

> Or, is there a
> requirement that it come out of a single meminfo field?
>
> -- Dave
>

2009-06-23 22:56:27

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

Alok Kataria wrote:
> On Tue, 2009-06-23 at 14:55 -0700, Rik van Riel wrote:
>> Alok Kataria wrote:
>>> On Tue, 2009-06-23 at 14:24 -0700, Rik van Riel wrote:

>> Things like page tables and dentry/inode caches vary
>> according to the use case and are allocated as needed.
>> They are in no way "static in nature".
>
> Maybe static was the wrong word to use here.
> What i meant was that you could always calculate the *maximum* amount of
> memory that is going to be used by page table and can also determine the
> % of memory that will be used by slab caches.

My point is that you cannot do that.

We have seen systems with 30% of physical memory in
page tables, as well as systems with a similar amount
of memory in the slab cache.

Yes, these were running legitimate workloads.

--
All rights reversed.

2009-06-23 23:28:34

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Tue, 2009-06-23 at 15:55 -0700, Rik van Riel wrote:
> Alok Kataria wrote:
> > On Tue, 2009-06-23 at 14:55 -0700, Rik van Riel wrote:
> >> Alok Kataria wrote:
> >>> On Tue, 2009-06-23 at 14:24 -0700, Rik van Riel wrote:
>
> >> Things like page tables and dentry/inode caches vary
> >> according to the use case and are allocated as needed.But I think we should think But I think we should think
> >> They are in no way "static in nature".
> >
> > Maybe static was the wrong word to use here.
> > What i meant was that you could always calculate the *maximum* amount of
> > memory that is going to be used by page table and can also determine the
> > % of memory that will be used by slab caches.
>
> My point is that you cannot do that.
>
> We have seen systems with 30% of physical memory in
> page tables,

I see, for some reason I thought that the user process's page tables
should be swappable, but that doesn't look like what we do.
Though, that count should be available by aggregating the total ACTIVE
and INACTIVE counts, right ?

Now regarding the patch that I posted, I need a way to get the hugepages
count, there are 2 ways of doing this.
1. exporting hugetlb_total_pages function for module usage.
2. use NR_UNEVICTABLE to reflect the hugepages count too.

For some reason I think (2) is the correct way to go. NR_UNEVICTABLE
should mean all the locked memory that the user requested to be locked.

I don't see a reason why NR_UNEVICTABLE should only mean # of pages on
UNEVICTABLE_LRU.

Thanks,
Alok


> as well as systems with a similar amount
> of memory in the slab cache.
>
> Yes, these were running legitimate workloads.
>

2009-06-23 23:41:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 15:23 -0700, Alok Kataria wrote:
> > Could you just teach the thing to which you are hinting that it also
> > needs to go look in sysfs for huge page counts?
>
> :) yeah i could do that too...the point is that its a module and the
> function to get the hugepages count is not exported right now. I could
> very well add this as an exported symbol and use it from there, but
> there can be someone who doesn't want symbols to be unnecessarily
> exported if their is no in-tree modular usage of that symbol.

Hmmm. So what is the module doing? The ol', "try to get as much memory
as I possibly can" game? :)

It sounds like you can get access to the vm statistics from existing
exported symbols, but the stats don't give you quite the info that you
need. So, you're trying to change things that you *can* get access to.

We do export all this stuff to userspace. We export all of the huge
page sizes and how many pages are reserved, used, and allocated in each,
plus the contentious Unevictable. Could you just do this calculation in
userspace and pass it into the module with a modparam or sysfs file?

-- Dave

2009-06-23 23:48:26

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Tue, 2009-06-23 at 16:28 -0700, Alok Kataria wrote:
> Now regarding the patch that I posted, I need a way to get the
> hugepages count, there are 2 ways of doing this.
> 1. exporting hugetlb_total_pages function for module usage.

Unfortunately, that won't even be enough. That only accounts for the
default hstate. There may be several hstate if the system supports
multiple large page sizes. It's all exported in sysfs, but I don't see
any simple way (other than sys_open()) for a module to get at it.

-- Dave

2009-06-24 00:08:39

by Alok Kataria

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.


On Tue, 2009-06-23 at 16:41 -0700, Dave Hansen wrote:
> On Tue, 2009-06-23 at 15:23 -0700, Alok Kataria wrote:
> > > Could you just teach the thing to which you are hinting that it also
> > > needs to go look in sysfs for huge page counts?
> >
> > :) yeah i could do that too...the point is that its a module and the
> > function to get the hugepages count is not exported right now. I could
> > very well add this as an exported symbol and use it from there, but
> > there can be someone who doesn't want symbols to be unnecessarily
> > exported if their is no in-tree modular usage of that symbol.
>
> Hmmm. So what is the module doing? The ol', "try to get as much memory
> as I possibly can" game? :)
>
> It sounds like you can get access to the vm statistics from existing
> exported symbols, but the stats don't give you quite the info that you
> need.

> So, you're trying to change things that you *can* get access to.

:) Not entirely, I thought that UNEVICTABLE by definition should
consider hugepages too.

>
> We do export all this stuff to userspace. We export all of the huge
> page sizes and how many pages are reserved, used, and allocated in each,
> plus the contentious Unevictable. Could you just do this calculation in
> userspace and pass it into the module with a modparam or sysfs file?

Hmm...lets see, I will look to moving that to userspace or something.

But irrespective of my need, we must clear the confusion around what
unevictable should actually mean.

I am biased towards getting hugepages accounted in that :), but what do
others think ?

Lee, I will let you take a decision on that, if the current semantics
look okay, it would be great if you could update the unevictable_lru
documentation about this being just the number of pages on
unevictable_lru.

Thanks,
Alok
>
> -- Dave
>

2009-06-29 09:58:40

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] Hugepages should be accounted as unevictable pages.

On Mon, Jun 22, 2009 at 02:25:41PM -0700, Alok Kataria wrote:
> Looking at the output of /proc/meminfo, a user might get confused in thinking
> that there are zero unevictable pages, though, in reality their can be
> hugepages which are inherently unevictable.
>

I think the problem may be with what meaning different people are
getting from "Unevictable"

For those of us that saw the Unevitable patches going by it means the number
of pages that could potentially be on the LRU lists but are not because
they are unevitable due to some action taken by the program - mlock() for
example. This does not include hugepages because we know they cannot be
reclaimed by any action other than directly freeing them.

The meaning you want is that Unevitable represents a count of the pages
that are unpagable such as pagetable pages, mlocked pages, hugepages,
etc.

> Though hugepages are not handled by the unevictable lru framework, they are
> infact unevictable in nature and global statistics counter should reflect that.
>

I somewhat disagree as the full count of unreclaimable pages can be
aggregated by looking at various statistics such as page table counts,
slab pages, locked, etc.

> For instance, I have allocated 20 huge pages on my system, meminfo shows this
>
> Unevictable: 0 kB
> Mlocked: 0 kB
> HugePages_Total: 20
> HugePages_Free: 20
> HugePages_Rsvd: 0
> HugePages_Surp: 0
>

Note that the hugepages_total here is for the default hugepage size. If
there are other hugepages, you need to go to sysfs for them. If you are in
the kernel, you need to walk through the hstates and aggregate the counters.

> After the patch:
>
> Unevictable: 81920 kB
> Mlocked: 0 kB
> HugePages_Total: 20
> HugePages_Free: 20
> HugePages_Rsvd: 0
> HugePages_Surp: 0
>

I'm not keen on this to be honest but that's because I know Unevitable to
mean pages that potentially could be on the LRU but are not. I think that
people will want to consider that separetly from hugepage usage.

What I would be ok with is changing the output of meminfo slightly.
Output "Pinned" which is a total count of pages that cannot be reclaimed at
the moment. Give subheadings for that such as Unevitable LRU (potentially
broken further down to what makes up the unevictable LRU count), pagetables,
hugepages, etc.

At minimum, your changelog needs to state why you need this information and
how it's going to be used. It's not clear why you cannot just add Unevitable
+ pagetables + slab + HugePages_Total to get an approximation of the amount
of pinned memory for example.

> Signed-off-by: Alok N Kataria <[email protected]>
> Cc: Lee Schermerhorn <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Mel Gorman <[email protected]>
>
> Index: linux-2.6/Documentation/vm/unevictable-lru.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/vm/unevictable-lru.txt 2009-06-22 11:49:27.000000000 -0700
> +++ linux-2.6/Documentation/vm/unevictable-lru.txt 2009-06-22 13:57:32.000000000 -0700
> @@ -71,6 +71,12 @@ The unevictable list addresses the follo
>
> (*) Those mapped into VM_LOCKED [mlock()ed] VMAs.
>
> + (*) Hugetlb pages are also unevictable. Hugepages are already implemented in
> + a way that these pages don't reside on the LRU and hence are not iterated
> + over during the vmscan. So there is no need to move around these pages
> + across different LRU's. We just account these pages as unevictable for
> + correct statistics.
> +
> The infrastructure may also be able to handle other conditions that make pages
> unevictable, either by definition or by circumstance, in the future.
>
> Index: linux-2.6/mm/hugetlb.c
> ===================================================================
> --- linux-2.6.orig/mm/hugetlb.c 2009-06-22 11:49:57.000000000 -0700
> +++ linux-2.6/mm/hugetlb.c 2009-06-22 14:04:05.000000000 -0700
> @@ -533,6 +533,8 @@ static void update_and_free_page(struct
> 1 << PG_dirty | 1 << PG_active | 1 << PG_reserved |
> 1 << PG_private | 1<< PG_writeback);
> }
> + mod_zone_page_state(page_zone(page), NR_LRU_BASE + LRU_UNEVICTABLE,
> + -(pages_per_huge_page(h)));
> set_compound_page_dtor(page, NULL);
> set_page_refcounted(page);
> arch_release_hugepage(page);
> @@ -584,6 +586,8 @@ static void prep_new_huge_page(struct hs
> spin_lock(&hugetlb_lock);
> h->nr_huge_pages++;
> h->nr_huge_pages_node[nid]++;
> + mod_zone_page_state(page_zone(page), NR_LRU_BASE + LRU_UNEVICTABLE,
> + pages_per_huge_page(h));
> spin_unlock(&hugetlb_lock);
> put_page(page); /* free it into the hugepage allocator */
> }
> @@ -749,6 +753,9 @@ static struct page *alloc_buddy_huge_pag
> */
> h->nr_huge_pages_node[nid]++;
> h->surplus_huge_pages_node[nid]++;
> + mod_zone_page_state(page_zone(page),
> + NR_LRU_BASE + LRU_UNEVICTABLE,
> + pages_per_huge_page(h));
> __count_vm_event(HTLB_BUDDY_PGALLOC);
> } else {
> h->nr_huge_pages--;
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab