2005-04-05 16:20:16

by Sen Horak

[permalink] [raw]
Subject: Hugetlpages Vs small pages

Hi,

My config: Pentium M/1400MHz/1MB L2/64 entry D-TLB
Kernel: 2.6.7-vanilla

I tried running a microbenchmark in which I "touched" every 4k in a
loop in 2 separate 4MB regions:

1) mapped from a hugetlbfs partition (MAP_FIXED)
2) mapped anonymously (MAP_ANONYMOUS)

The result I'm getting is that both seem to be delivering the same
performance, with an edge for MAP_ANONYMOUS.

>From hugetlbfs: 3515154447 cycles
MAP_ANONYMOUS: 3461527548 cycles
[using rdtsc]

samples % symbol name
630 100.000 main
samples % symbol name
624 100.000 main
[using oprofile]

Can somebody explain why this is happening? Unfortunately, there is no
hardware performance counter that allows one to measure D-TLB misses
on the Pentium, so I can't measure that.

Regards,
Sen

void *init_huge_page(int size) {
fd = open("/mnt/superpage/x", O_CREAT|O_RDWR, 0755);
if (fd < 0) {
perror("Open failed");
exit(errno);
}
addr = (unsigned long) mmap(0x80000000, 4*1024*1024,
(PROT_READ|PROT_WRITE), MAP_SHARED|MAP_FIXED, fd, 0);
if (addr==-1) {
perror("mmap failed");
exit(errno);
}
else
printf("Address=%x\n",addr);
return (void *) addr;
}

void exit_huge_page(int size) {
if (addr!=-1) munmap(addr,size);
close(fd);
}


int main(int argc,char *argv[])
{
int j,i;
unsigned long cnt1,cnt2;

void *p=init_huge_page(BUFSIZE);

setpriority(PRIO_PROCESS, 0, -20);
rdtsc(cnt1);
for(j=0;j<1000;j++) {
for(i=0;i<(4*1024*1024);i+=1024) {
((char *)p)[i]='H';
}
}
rdtsc(cnt2);
printf("%0.2f\n",(cnt2-cnt1)/1000000000.0);
exit_huge_page(BUFSIZE);