2013-04-01 08:01:13

by Zhouping Liu

[permalink] [raw]
Subject: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

Hi all,

I found THP can't correctly distinguish one anonymous hugepage map.

1. when /sys/kernel/mm/transparent_hugepage/enabled is 'always', the
amount of THP always is one less.

Testing code:
---- snip --------
unsigned long hugepagesize = (1UL << 21);

int main()
{
void *addr;
int i;

printf("pid is %d\n", getpid());

for (i = 0; i < 5; i++) {
addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);

if (addr == MAP_FAILED) {
perror("mmap");
return -1;
}

memset(addr, i, hugepagesize);
}

sleep(50);

return 0;
}
------ snip ----------

the /proc/[pid]/smaps show that Anonymous is 10240kB but AnonHugePages is 8192Kb, one THP less:
----- snip --------
7f59ccc01000-7f59cd601000 rw-p 00000000 00:00 0
Size: 10240 kB
Rss: 10240 kB
Pss: 10240 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 10240 kB
Referenced: 10240 kB
Anonymous: 10240 kB
AnonHugePages: 8192 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
VmFlags: rd wr mr mw me ac hg
------- sinp ---------

2. when /sys/kernel/mm/transparent_hugepage/enabled is 'madvise', THP can't
distinguish any one anonymous hugepage size:

Testing code:
-------- snip --------
unsigned long hugepagesize = (1UL << 21);

int main()
{
void *addr;
int i;

printf("pid is %d\n", getpid());

for (i = 0; i < 5; i++) {
addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);

if (addr == MAP_FAILED) {
perror("mmap");
return -1;
}

if (madvise(addr, hugepagesize, MADV_HUGEPAGE) == -1) {
perror("madvise");
return -1;
}

memset(addr, i, hugepagesize);
}

sleep(50);

return 0;
}
--------- snip ----------

The result is that it can't find any AnonHugePages from /proc/[pid]/smaps :
-------------- snip -------
7f0b38cd0000-7f0b396d0000 rw-p 00000000 00:00 0
Size: 10240 kB
Rss: 10240 kB
Pss: 10240 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 10240 kB
Referenced: 10240 kB
Anonymous: 10240 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
VmFlags: rd wr mr mw me ac
----------- snip ----------

3. when I made the address aligned with HUGEPAGESIZE using 'posix_memalign()' instead of mmap(),
THP perform good, and can distinguish all anonymous huge pages.

my question is:
1. all the above behaviour is right?
2. why THP can't distinguish one naturally aligned huge page(generated by mmap())?

--
Thanks,
Zhouping


2013-04-01 22:24:02

by David Rientjes

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On Mon, 1 Apr 2013, Zhouping Liu wrote:

> Hi all,
>
> I found THP can't correctly distinguish one anonymous hugepage map.
>
> 1. when /sys/kernel/mm/transparent_hugepage/enabled is 'always', the
> amount of THP always is one less.
>

It's not a problem with identifying an anonymous mapping as a hugepage,
setting thp enabled to "always" does not guarantee that they will always
be allocatable or that your mmap() will be 2MB aligned. Your sample code
is using mmap() instead of posix_memalign() so you'll probably only get
100% hugepages only 1/512th of the time.

> 2. when /sys/kernel/mm/transparent_hugepage/enabled is 'madvise', THP can't
> distinguish any one anonymous hugepage size:
>
> Testing code:
> -------- snip --------
> unsigned long hugepagesize = (1UL << 21);
>
> int main()
> {
> void *addr;
> int i;
>
> printf("pid is %d\n", getpid());
>
> for (i = 0; i < 5; i++) {
> addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
>
> if (addr == MAP_FAILED) {
> perror("mmap");
> return -1;
> }
>
> if (madvise(addr, hugepagesize, MADV_HUGEPAGE) == -1) {
> perror("madvise");
> return -1;
> }
>
> memset(addr, i, hugepagesize);
> }
>
> sleep(50);
>
> return 0;
> }
> --------- snip ----------
>
> The result is that it can't find any AnonHugePages from /proc/[pid]/smaps :
> -------------- snip -------
> 7f0b38cd0000-7f0b396d0000 rw-p 00000000 00:00 0
> Size: 10240 kB
> Rss: 10240 kB
> Pss: 10240 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> Private_Dirty: 10240 kB
> Referenced: 10240 kB
> Anonymous: 10240 kB
> AnonHugePages: 0 kB
> Swap: 0 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
> Locked: 0 kB
> VmFlags: rd wr mr mw me ac

"hg" would be shown in VmFlags if your MADV_HUGEPAGE was successful, are
you sure this is the right vma?

2013-04-02 03:11:49

by Zhouping Liu

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On 04/02/2013 06:23 AM, David Rientjes wrote:
> On Mon, 1 Apr 2013, Zhouping Liu wrote:
>
>> Hi all,
>>
>> I found THP can't correctly distinguish one anonymous hugepage map.
>>
>> 1. when /sys/kernel/mm/transparent_hugepage/enabled is 'always', the
>> amount of THP always is one less.
>>
> It's not a problem with identifying an anonymous mapping as a hugepage,
> setting thp enabled to "always" does not guarantee that they will always
> be allocatable or that your mmap() will be 2MB aligned. Your sample code
> is using mmap() instead of posix_memalign() so you'll probably only get
> 100% hugepages only 1/512th of the time.

I don't understand clearly the last sentence 'you'll probably only get
100% hugepages only 1/512th of the time.'
could you please explain more details about 'only 1/512th of the time'?

>
>> 2. when /sys/kernel/mm/transparent_hugepage/enabled is 'madvise', THP can't
>> distinguish any one anonymous hugepage size:
>>
>> Testing code:
>> -------- snip --------
>> unsigned long hugepagesize = (1UL << 21);
>>
>> int main()
>> {
>> void *addr;
>> int i;
>>
>> printf("pid is %d\n", getpid());
>>
>> for (i = 0; i < 5; i++) {
>> addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
>>
>> if (addr == MAP_FAILED) {
>> perror("mmap");
>> return -1;
>> }
>>
>> if (madvise(addr, hugepagesize, MADV_HUGEPAGE) == -1) {
>> perror("madvise");
>> return -1;
>> }
>>
>> memset(addr, i, hugepagesize);
>> }
>>
>> sleep(50);
>>
>> return 0;
>> }
>> --------- snip ----------
>>
>> The result is that it can't find any AnonHugePages from /proc/[pid]/smaps :
>> -------------- snip -------
>> 7f0b38cd0000-7f0b396d0000 rw-p 00000000 00:00 0
>> Size: 10240 kB
>> Rss: 10240 kB
>> Pss: 10240 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 10240 kB
>> Referenced: 10240 kB
>> Anonymous: 10240 kB
>> AnonHugePages: 0 kB
>> Swap: 0 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Locked: 0 kB
>> VmFlags: rd wr mr mw me ac
> "hg" would be shown in VmFlags if your MADV_HUGEPAGE was successful, are
> you sure this is the right vma?

I think it's the same issue as the above, according to the sample code,
it does mmap() five times in total,
and each time it map 2MB anonymous maps, as all the maps maybe aren't
2MB aligned, so "AnonHugePages"
show 0 kB, and no "hg" VmFalgs.

so, again, if I understand correctly, thp should tune the naturally
aligned maps, such as generated by mmap()/malloc(),
make such maps 'hugepagesize' aligned if the maps or vma is equal and
greater than 'hugepagesize', doesn't it?

Thanks,
Zhouping

2013-04-02 03:38:33

by Lin Feng

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

Hi Zhouping,

On 04/02/2013 11:09 AM, Zhouping Liu wrote:
> I don't understand clearly the last sentence 'you'll probably only get 100% hugepages only 1/512th of the time.'
> could you please explain more details about 'only 1/512th of the time'?

IIUC, thp size is 2M so it may be comprised of 512 normal page(size 4k).
Since your test code is not 2M aligned(not using posix_memalign()) so
the start address of the mapped vma will be random, such as
2M*i+4k*1, 2M*i+4k*2...2M*i+k4*511, there is 512 possibilities.

The only chance you get thp happens when the first map just starts at 2M*i,
and the consequent maps also benefit from this.

-------- snip --------
>
> so, again, if I understand correctly, thp should tune the naturally aligned maps, such as generated by mmap()/malloc(),
> make such maps 'hugepagesize' aligned if the maps or vma is equal and greater than 'hugepagesize', doesn't it?

We may gain performance improving from this.

thanks,
linfeng

2013-04-02 03:44:38

by David Rientjes

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On Tue, 2 Apr 2013, Lin Feng wrote:

> > so, again, if I understand correctly, thp should tune the naturally aligned maps, such as generated by mmap()/malloc(),
> > make such maps 'hugepagesize' aligned if the maps or vma is equal and greater than 'hugepagesize', doesn't it?
>
> We may gain performance improving from this.
>

To attain the maximum number of hugepages, you would naturally want to
ensure that the mappings are done aligned to 2MB; for very large
allocations, missing one or two hugepages typically won't hurt performance
much. posix_memalign() is the best way of doing this which just wraps
mmap() for the needed alignment. More interesting is creating your own
custom malloc() that allocates in 2MB aligned chunks, if possible, and
uses 2MB aligned arenas for its own metadata. If you do that for
malloc(), then you'll only need to make code that does its own mmap()s to
use posix_memalign().

2013-04-02 03:51:16

by Zhouping Liu

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On 04/02/2013 11:40 AM, Lin Feng wrote:
> Hi Zhouping,
>
> On 04/02/2013 11:09 AM, Zhouping Liu wrote:
>> I don't understand clearly the last sentence 'you'll probably only get 100% hugepages only 1/512th of the time.'
>> could you please explain more details about 'only 1/512th of the time'?
> IIUC, thp size is 2M so it may be comprised of 512 normal page(size 4k).
> Since your test code is not 2M aligned(not using posix_memalign()) so
> the start address of the mapped vma will be random, such as
> 2M*i+4k*1, 2M*i+4k*2...2M*i+k4*511, there is 512 possibilities.
>
> The only chance you get thp happens when the first map just starts at 2M*i,
> and the consequent maps also benefit from this.

Feng, it's easy to understand now, thanks for your detailed explanation :)

Zhouping

2013-04-02 12:23:46

by Simon Jeons

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

Hi David,
On 04/02/2013 06:23 AM, David Rientjes wrote:
> On Mon, 1 Apr 2013, Zhouping Liu wrote:
>
>> Hi all,
>>
>> I found THP can't correctly distinguish one anonymous hugepage map.
>>
>> 1. when /sys/kernel/mm/transparent_hugepage/enabled is 'always', the
>> amount of THP always is one less.
>>
> It's not a problem with identifying an anonymous mapping as a hugepage,
> setting thp enabled to "always" does not guarantee that they will always
> be allocatable or that your mmap() will be 2MB aligned. Your sample code

Both thp and hugetlb pages should be 2MB aligned, correct?

> is using mmap() instead of posix_memalign() so you'll probably only get
> 100% hugepages only 1/512th of the time.
>
>> 2. when /sys/kernel/mm/transparent_hugepage/enabled is 'madvise', THP can't
>> distinguish any one anonymous hugepage size:
>>
>> Testing code:
>> -------- snip --------
>> unsigned long hugepagesize = (1UL << 21);
>>
>> int main()
>> {
>> void *addr;
>> int i;
>>
>> printf("pid is %d\n", getpid());
>>
>> for (i = 0; i < 5; i++) {
>> addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
>>
>> if (addr == MAP_FAILED) {
>> perror("mmap");
>> return -1;
>> }
>>
>> if (madvise(addr, hugepagesize, MADV_HUGEPAGE) == -1) {
>> perror("madvise");
>> return -1;
>> }
>>
>> memset(addr, i, hugepagesize);
>> }
>>
>> sleep(50);
>>
>> return 0;
>> }
>> --------- snip ----------
>>
>> The result is that it can't find any AnonHugePages from /proc/[pid]/smaps :
>> -------------- snip -------
>> 7f0b38cd0000-7f0b396d0000 rw-p 00000000 00:00 0
>> Size: 10240 kB
>> Rss: 10240 kB
>> Pss: 10240 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 10240 kB
>> Referenced: 10240 kB
>> Anonymous: 10240 kB
>> AnonHugePages: 0 kB
>> Swap: 0 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Locked: 0 kB
>> VmFlags: rd wr mr mw me ac
> "hg" would be shown in VmFlags if your MADV_HUGEPAGE was successful, are
> you sure this is the right vma?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-02 12:26:59

by Simon Jeons

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On 04/02/2013 06:23 AM, David Rientjes wrote:
> On Mon, 1 Apr 2013, Zhouping Liu wrote:
>
>> Hi all,
>>
>> I found THP can't correctly distinguish one anonymous hugepage map.
>>
>> 1. when /sys/kernel/mm/transparent_hugepage/enabled is 'always', the
>> amount of THP always is one less.
>>
> It's not a problem with identifying an anonymous mapping as a hugepage,
> setting thp enabled to "always" does not guarantee that they will always
> be allocatable or that your mmap() will be 2MB aligned. Your sample code

Btw, why need 2MB aligned? Does it has relationship with tlb?

>
> is using mmap() instead of posix_memalign() so you'll probably only get
> 100% hugepages only 1/512th of the time.
>
>> 2. when /sys/kernel/mm/transparent_hugepage/enabled is 'madvise', THP can't
>> distinguish any one anonymous hugepage size:
>>
>> Testing code:
>> -------- snip --------
>> unsigned long hugepagesize = (1UL << 21);
>>
>> int main()
>> {
>> void *addr;
>> int i;
>>
>> printf("pid is %d\n", getpid());
>>
>> for (i = 0; i < 5; i++) {
>> addr = mmap(NULL, hugepagesize, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
>>
>> if (addr == MAP_FAILED) {
>> perror("mmap");
>> return -1;
>> }
>>
>> if (madvise(addr, hugepagesize, MADV_HUGEPAGE) == -1) {
>> perror("madvise");
>> return -1;
>> }
>>
>> memset(addr, i, hugepagesize);
>> }
>>
>> sleep(50);
>>
>> return 0;
>> }
>> --------- snip ----------
>>
>> The result is that it can't find any AnonHugePages from /proc/[pid]/smaps :
>> -------------- snip -------
>> 7f0b38cd0000-7f0b396d0000 rw-p 00000000 00:00 0
>> Size: 10240 kB
>> Rss: 10240 kB
>> Pss: 10240 kB
>> Shared_Clean: 0 kB
>> Shared_Dirty: 0 kB
>> Private_Clean: 0 kB
>> Private_Dirty: 10240 kB
>> Referenced: 10240 kB
>> Anonymous: 10240 kB
>> AnonHugePages: 0 kB
>> Swap: 0 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Locked: 0 kB
>> VmFlags: rd wr mr mw me ac
> "hg" would be shown in VmFlags if your MADV_HUGEPAGE was successful, are
> you sure this is the right vma?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-04-02 18:09:05

by David Rientjes

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

On Tue, 2 Apr 2013, Simon Jeons wrote:

> Both thp and hugetlb pages should be 2MB aligned, correct?
>

To answer this question and your followup reply at the same time: they
come from one level higher in the page table so they will naturally need
to be 2MB aligned.

2013-04-02 23:58:33

by Simon Jeons

[permalink] [raw]
Subject: Re: THP: AnonHugePages in /proc/[pid]/smaps is correct or not?

Hi David,
On 04/03/2013 02:09 AM, David Rientjes wrote:
> On Tue, 2 Apr 2013, Simon Jeons wrote:
>
>> Both thp and hugetlb pages should be 2MB aligned, correct?
>>
> To answer this question and your followup reply at the same time: they
> come from one level higher in the page table so they will naturally need
> to be 2MB aligned.

When I hacking arch/x86/mm/hugetlbpage.c like this,
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index ae1aa71..87f34ee 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -354,14 +354,13 @@ hugetlb_get_unmapped_area(struct file *file,
unsigned long addr,

#endif /*HAVE_ARCH_HUGETLB_UNMAPPED_AREA*/

-#ifdef CONFIG_X86_64
static __init int setup_hugepagesz(char *opt)
{
unsigned long ps = memparse(opt, &opt);
if (ps == PMD_SIZE) {
hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
- } else if (ps == PUD_SIZE && cpu_has_gbpages) {
- hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+ } else if (ps == PUD_SIZE) {
+ hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT+4);
} else {
printk(KERN_ERR "hugepagesz: Unsupported page size %lu M\n",
ps >> 20);

I set boot=hugepagesz=1G hugepages=10, then I got 10 32MB huge pages.
What's the difference between these pages which I hacking and normal
huge pages?