2006-01-18 19:49:22

by Don Dupuis

[permalink] [raw]
Subject: Can't mlock hugetlb in 2.6.15

I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
This app has 128MB or more of shared memory that is using hugepages via
mmap. When I try this, I get the error "can't allocate memory". Is this a
kernel bug or is this not supported anymore. I want to guarantee that
this memory doesn't get swapped out to a swap device. I made the same
modifications to include/linux/resource.h that was in 2.6.14, which
set MLOCK_LIMIT to 2GB.

Thanks

Don


2006-01-21 07:53:03

by Andrew Morton

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

Don Dupuis <[email protected]> wrote:
>
> I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> This app has 128MB or more of shared memory that is using hugepages via
> mmap. When I try this, I get the error "can't allocate memory". Is this a
> kernel bug or is this not supported anymore. I want to guarantee that
> this memory doesn't get swapped out to a swap device.

hugetlb areas are not pageable and it's very unlikely that they will become
so in the forseeable future. So you don't need to do this.

That being said, we shouldn't have broken your application.

I guess a suitable back-compatibility fix would be to check for a hugetlb
vma early on and return "success" for that vma section without actually
doing anything.

But we need to understand why this happened.

> I made the same
> modifications to include/linux/resource.h that was in 2.6.14, which
> set MLOCK_LIMIT to 2GB.
>

That's rather naughty of you ;) You're supposed to use setrlimit() in a
parent process for this...

2006-01-21 14:52:13

by Nick Piggin

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

Andrew Morton wrote:
> Don Dupuis <[email protected]> wrote:
>
>>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
>>This app has 128MB or more of shared memory that is using hugepages via
>>mmap. When I try this, I get the error "can't allocate memory". Is this a
>>kernel bug or is this not supported anymore. I want to guarantee that
>>this memory doesn't get swapped out to a swap device.
>
>
> hugetlb areas are not pageable and it's very unlikely that they will become
> so in the forseeable future. So you don't need to do this.
>
> That being said, we shouldn't have broken your application.
>

Yep, and it does not sound unreasonable to have mlock succeed on hugepage
areas (though I'm not reading any standardese). And you wouldn't expect
mlockall to fail if an app is using hugepages either.

I don't have an idea off the top of my head though. Don, an strace log of
the failing sequence of syscalls could be helpful.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-01-23 02:32:09

by Don Dupuis

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

On 1/21/06, Nick Piggin <[email protected]> wrote:
> Andrew Morton wrote:
> > Don Dupuis <[email protected]> wrote:
> >
> >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> >>This app has 128MB or more of shared memory that is using hugepages via
> >>mmap. When I try this, I get the error "can't allocate memory". Is this a
> >>kernel bug or is this not supported anymore. I want to guarantee that
> >>this memory doesn't get swapped out to a swap device.
> >
> >
> > hugetlb areas are not pageable and it's very unlikely that they will become
> > so in the forseeable future. So you don't need to do this.
> >
> > That being said, we shouldn't have broken your application.
> >
>
> Yep, and it does not sound unreasonable to have mlock succeed on hugepage
> areas (though I'm not reading any standardese). And you wouldn't expect
> mlockall to fail if an app is using hugepages either.
>
> I don't have an idea off the top of my head though. Don, an strace log of
> the failing sequence of syscalls could be helpful.
>
> --
> SUSE Labs, Novell Inc.
> Send instant messages to your online friends http://au.messenger.yahoo.com
>
>
This first program sets everything up. The directory /pivot3/mem is
mounted on a hugetlbfs filesystem. Here is the strace output of
sducstart:


execve("/pivot3/bin/sducstart", ["/pivot3/bin/sducstart"], [/* 17 vars */]) = 0
uname({sys="Linux", node="DB-FVVQK61", ...}) = 0
brk(0) = 0x804b000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=10819, ...}) = 0
old_mmap(NULL, 10819, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f45000
close(3) = 0
open("/lib/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@G\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=105916, ...}) = 0
old_mmap(NULL, 70128, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f33000
old_mmap(0xb7f41000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0xb7f41000
old_mmap(0xb7f43000, 4592, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f43000
close(3) = 0
open("/lib/tls/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340 \0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=49096, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f32000
old_mmap(NULL, 81912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f1e000
old_mmap(0xb7f26000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f26000
old_mmap(0xb7f28000, 40952, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f28000
close(3) = 0
open("/usr/lib/libaio.so.1.0.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0$\4\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2764, ...}) = 0
old_mmap(NULL, 6120, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f1c000
old_mmap(0xb7f1d000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xb7f1d000
close(3) = 0
open("/usr/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\341"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=878185, ...}) = 0
old_mmap(NULL, 264076, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7edb000
old_mmap(0xb7f13000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x38000) = 0xb7f13000
old_mmap(0xb7f1b000, 1932, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f1b000
close(3) = 0
open("/usr/lib/liblwres.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 #\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=59620, ...}) = 0
old_mmap(NULL, 62556, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ecb000
old_mmap(0xb7eda000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0xb7eda000
close(3) = 0
open("/lib/libnsl.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p:\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=94216, ...}) = 0
old_mmap(NULL, 88288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7eb5000
old_mmap(0xb7ec7000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0xb7ec7000
old_mmap(0xb7ec9000, 6368, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ec9000
close(3) = 0
open("/lib/libuuid.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \n\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=8232, ...}) = 0
old_mmap(NULL, 11132, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7eb2000
old_mmap(0xb7eb4000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7eb4000
close(3) = 0
open("/usr/local/lib/libdbxml-2.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360N\7"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=21882033, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eb1000
old_mmap(NULL, 1548048, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7d37000
old_mmap(0xb7ea9000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0xb7ea9000
close(3) = 0
open("/usr/local/lib/libdb_cxx-4.3.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\4~\1\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=922992, ...}) = 0
old_mmap(NULL, 827964, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7c6c000
old_mmap(0xb7d34000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc7000) = 0xb7d34000
close(3) = 0
open("/usr/local/lib/libpathan.so.3", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0L!\n\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=3402296, ...}) = 0
old_mmap(NULL, 2614400, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb79ed000
old_mmap(0xb7bcc000, 651264, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1de000) = 0xb7bcc000
old_mmap(0xb7c6b000, 1152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7c6b000
close(3) = 0
open("/usr/local/lib/libxerces-c.so.26", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0H\325\16"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4115828, ...}) = 0
old_mmap(NULL, 3328916, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb76c0000
old_mmap(0xb79bd000, 196608, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2fc000) = 0xb79bd000
close(3) = 0
open("/usr/local/lib/libxquery-1.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\330\373"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4217955, ...}) = 0
old_mmap(NULL, 716312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7611000
old_mmap(0xb76bd000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xab000) = 0xb76bd000
close(3) = 0
open("/lib/libcrypt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\7\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=26940, ...}) = 0
old_mmap(NULL, 184636, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb75e3000
old_mmap(0xb75e8000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0xb75e8000
old_mmap(0xb75ea000, 155964, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb75ea000
close(3) = 0
open("/lib/tls/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260K\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1488740, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb75e2000
old_mmap(NULL, 1195116, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb74be000
old_mmap(0xb75dc000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11d000) = 0xb75dc000
old_mmap(0xb75e0000, 7276, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb75e0000
close(3) = 0
open("/usr/lib/libstdc++.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\276\3"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=739700, ...}) = 0
old_mmap(NULL, 759124, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7404000
old_mmap(0xb74b4000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb0000) = 0xb74b4000
old_mmap(0xb74b9000, 17748, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb74b9000
close(3) = 0
open("/lib/tls/libm.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0003\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=212692, ...}) = 0
old_mmap(NULL, 139424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb73e1000
old_mmap(0xb7402000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0xb7402000
close(3) = 0
open("/lib/libgcc_s.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\f\25\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=29404, ...}) = 0
old_mmap(NULL, 32216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb73d9000
old_mmap(0xb73e0000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0xb73e0000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb73d8000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb73d7000
mprotect(0xb7402000, 4096, PROT_READ) = 0
mprotect(0xb75dc000, 8192, PROT_READ) = 0
mprotect(0xb75e8000, 4096, PROT_READ) = 0
mprotect(0xb7ec7000, 4096, PROT_READ) = 0
mprotect(0xb7f26000, 4096, PROT_READ) = 0
mprotect(0xb7f41000, 4096, PROT_READ) = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb73d7080,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f45000, 10819) = 0
set_tid_address(0xb73d70c8) = 10775
rt_sigaction(SIGRTMIN, {0xb7f373a0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbfc59ec8, 35, (nil), 0}) = 0
brk(0) = 0x804b000
brk(0x806c000) = 0x806c000
brk(0x808d000) = 0x808d000
brk(0x80b2000) = 0x80b2000
pipe([3, 4]) = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb73d70c8) = 10776
close(4) = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
read(3, "420\n", 4096) = 4
--- SIGCHLD (Child exited) @ 0 (0) ---
close(3) = 0
waitpid(10776, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 10776
munmap(0xb7f47000, 4096) = 0
pipe([3, 4]) = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb73d70c8) = 10777
close(4) = 0
fstat64(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
read(3, "4096\n", 4096) = 5
close(3) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(10777, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 10777
munmap(0xb7f47000, 4096) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 64), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon
echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f47000
write(1, "SDUC_InitGetAvailableHugePages: "..., 73) = 73
open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
3, 0) = 0x4e000000
write(1, "SDUC_CreateShareMemObject: creat"..., 51) = 51
munmap(0x4e000000, 1761607680) = 0
close(3) = 0
statfs("/dev/shm/", {f_type=0x1021994, f_bsize=4096, f_blocks=259412,
f_bfree=259412, f_bavail=259412, f_files=223977, f_ffree=223976,
f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
futex(0xb7f27258, FUTEX_WAKE, 2147483647) = 0
open("/dev/shm/__PROFILER_POSIX_SHAREDMEM_OBJECT__",
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0666) = 3
fcntl64(3, F_GETFD) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
ftruncate(3, 131072) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xb73b7000
close(3) = 0
munmap(0xb7f47000, 4096) = 0
exit_group(0) = ?


This is the strace output of sductest that is a test program to access
the shared memory that was setup by sducstart:

execve("/pivot3/bin/SDUCTest", ["/pivot3/bin/SDUCTest"], [/* 17 vars */]) = 0
uname({sys="Linux", node="DB-FVVQK61", ...}) = 0
brk(0) = 0x804f000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=10819, ...}) = 0
old_mmap(NULL, 10819, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6e000
close(3) = 0
open("/lib/tls/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@G\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=105916, ...}) = 0
old_mmap(NULL, 70128, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f5c000
old_mmap(0xb7f6a000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0xb7f6a000
old_mmap(0xb7f6c000, 4592, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f6c000
close(3) = 0
open("/lib/tls/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340 \0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=49096, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f5b000
old_mmap(NULL, 81912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f47000
old_mmap(0xb7f4f000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f4f000
old_mmap(0xb7f51000, 40952, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f51000
close(3) = 0
open("/usr/lib/libaio.so.1.0.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0$\4\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2764, ...}) = 0
old_mmap(NULL, 6120, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f45000
old_mmap(0xb7f46000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0xb7f46000
close(3) = 0
open("/usr/lib/libncurses.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\341"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=878185, ...}) = 0
old_mmap(NULL, 264076, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f04000
old_mmap(0xb7f3c000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x38000) = 0xb7f3c000
old_mmap(0xb7f44000, 1932, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f44000
close(3) = 0
open("/usr/lib/liblwres.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 #\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=59620, ...}) = 0
old_mmap(NULL, 62556, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ef4000
old_mmap(0xb7f03000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0xb7f03000
close(3) = 0
open("/lib/libnsl.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p:\0\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=94216, ...}) = 0
old_mmap(NULL, 88288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7ede000
old_mmap(0xb7ef0000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0xb7ef0000
old_mmap(0xb7ef2000, 6368, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7ef2000
close(3) = 0
open("/lib/libuuid.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \n\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=8232, ...}) = 0
old_mmap(NULL, 11132, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7edb000
old_mmap(0xb7edd000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xb7edd000
close(3) = 0
open("/usr/local/lib/libdbxml-2.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360N\7"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=21882033, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eda000
old_mmap(NULL, 1548048, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7d60000
old_mmap(0xb7ed2000, 32768, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0xb7ed2000
close(3) = 0
open("/usr/local/lib/libdb_cxx-4.3.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\4~\1\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=922992, ...}) = 0
old_mmap(NULL, 827964, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7c95000
old_mmap(0xb7d5d000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc7000) = 0xb7d5d000
close(3) = 0
open("/usr/local/lib/libpathan.so.3", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0L!\n\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=3402296, ...}) = 0
old_mmap(NULL, 2614400, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7a16000
old_mmap(0xb7bf5000, 651264, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1de000) = 0xb7bf5000
old_mmap(0xb7c94000, 1152, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7c94000
close(3) = 0
open("/usr/local/lib/libxerces-c.so.26", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0H\325\16"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4115828, ...}) = 0
old_mmap(NULL, 3328916, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb76e9000
old_mmap(0xb79e6000, 196608, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2fc000) = 0xb79e6000
close(3) = 0
open("/usr/local/lib/libxquery-1.1.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\330\373"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=4217955, ...}) = 0
old_mmap(NULL, 716312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb763a000
old_mmap(0xb76e6000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xab000) = 0xb76e6000
close(3) = 0
open("/lib/libcrypt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\7\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=26940, ...}) = 0
old_mmap(NULL, 184636, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb760c000
old_mmap(0xb7611000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0xb7611000
old_mmap(0xb7613000, 155964, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7613000
close(3) = 0
open("/lib/tls/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260K\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1488740, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb760b000
old_mmap(NULL, 1195116, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb74e7000
old_mmap(0xb7605000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11d000) = 0xb7605000
old_mmap(0xb7609000, 7276, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7609000
close(3) = 0
open("/usr/lib/libstdc++.so.5", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0\276\3"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=739700, ...}) = 0
old_mmap(NULL, 759124, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb742d000
old_mmap(0xb74dd000, 20480, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb0000) = 0xb74dd000
old_mmap(0xb74e2000, 17748, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb74e2000
close(3) = 0
open("/lib/tls/libm.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0003\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=212692, ...}) = 0
old_mmap(NULL, 139424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb740a000
old_mmap(0xb742b000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0xb742b000
close(3) = 0
open("/lib/libgcc_s.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\f\25\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=29404, ...}) = 0
old_mmap(NULL, 32216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7402000
old_mmap(0xb7409000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0xb7409000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7401000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7400000
mprotect(0xb742b000, 4096, PROT_READ) = 0
mprotect(0xb7605000, 8192, PROT_READ) = 0
mprotect(0xb7611000, 4096, PROT_READ) = 0
mprotect(0xb7ef0000, 4096, PROT_READ) = 0
mprotect(0xb7f4f000, 4096, PROT_READ) = 0
mprotect(0xb7f6a000, 4096, PROT_READ) = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7400080,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f6e000, 10819) = 0
set_tid_address(0xb74000c8) = 10780
rt_sigaction(SIGRTMIN, {0xb7f603a0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbf885598, 35, (nil), 0}) = 0
brk(0) = 0x804f000
brk(0x8070000) = 0x8070000
brk(0x8091000) = 0x8091000
brk(0x80b6000) = 0x80b6000
open("/pivot3/mem/sduc", O_RDWR) = 3
mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
0) = -1 ENOMEM (Cannot allocate memory)
close(3) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(4, 64), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon
echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb6fff000
write(1, "SDUC_MapShareMemObject SDUC shar"..., 57) = 57
unlink("/pivot3/mem/sduc") = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Thanks

Don

2006-01-23 19:51:14

by Hugh Dickins

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

On Sun, 22 Jan 2006, Don Dupuis wrote:
> On 1/21/06, Nick Piggin <[email protected]> wrote:
> > Andrew Morton wrote:
> > > Don Dupuis <[email protected]> wrote:
> > >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> > > That being said, we shouldn't have broken your application.
> > Don, an strace log of the failing sequence of syscalls could be helpful.
>
> sducstart:
> open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
> mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
> 3, 0) = 0x4e000000
>
> This is the strace output of sductest that is a test program to access
> the shared memory that was setup by sducstart:
> open("/pivot3/mem/sduc", O_RDWR) = 3
> mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
> 0) = -1 ENOMEM (Cannot allocate memory)

Thanks a lot for the strace, that indeed helped to track it down.

This has nothing to do with mlock or MAP_LOCKED - which by the way do
make more sense in 2.6.15, since they provide a way of prefaulting the
hugepage area like in earlier releases (now hugepages are being faulted
in on demand, though never paged out, as Andrew said).

Please try the patch below, and let us know if it works for you - thanks.
Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.


2.6.15's hugepage faulting introduced huge_pages_needed accounting into
hugetlbfs: to count how many pages are already in cache, for spot check
on how far a new mapping may be allowed to extend the file. But it's
muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once
pages were already in cache, it would overshoot, wrap its hugepages
count backwards, and so fail a harmless repeat mapping with -ENOMEM.
Fixes the problem found by Don Dupuis.

Signed-off-by: Hugh Dickins <[email protected]>
---

fs/hugetlbfs/inode.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

--- 2.6.15/fs/hugetlbfs/inode.c 2006-01-03 03:21:10.000000000 +0000
+++ linux/fs/hugetlbfs/inode.c 2006-01-23 18:39:47.000000000 +0000
@@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
unsigned long start = vma->vm_start;
unsigned long end = vma->vm_end;
unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
- pgoff_t next = vma->vm_pgoff;
- pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
+ pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
+ pgoff_t endpg = next + hugepages;

pagevec_init(&pvec, 0);
while (next < endpg) {

2006-01-23 20:53:30

by Don Dupuis

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

On 1/23/06, Hugh Dickins <[email protected]> wrote:
> On Sun, 22 Jan 2006, Don Dupuis wrote:
> > On 1/21/06, Nick Piggin <[email protected]> wrote:
> > > Andrew Morton wrote:
> > > > Don Dupuis <[email protected]> wrote:
> > > >>I have an app that mlocks hugepages. The same app works just fine in 2.6.14.
> > > > That being said, we shouldn't have broken your application.
> > > Don, an strace log of the failing sequence of syscalls could be helpful.
> >
> > sducstart:
> > open("/pivot3/mem/sduc", O_RDWR|O_CREAT, 0666) = 3
> > mmap2(NULL, 1761607680, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED,
> > 3, 0) = 0x4e000000
> >
> > This is the strace output of sductest that is a test program to access
> > the shared memory that was setup by sducstart:
> > open("/pivot3/mem/sduc", O_RDWR) = 3
> > mmap2(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_LOCKED, 3,
> > 0) = -1 ENOMEM (Cannot allocate memory)
>
> Thanks a lot for the strace, that indeed helped to track it down.
>
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
>
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
>
>
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file. But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> ---
>
> fs/hugetlbfs/inode.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> --- 2.6.15/fs/hugetlbfs/inode.c 2006-01-03 03:21:10.000000000 +0000
> +++ linux/fs/hugetlbfs/inode.c 2006-01-23 18:39:47.000000000 +0000
> @@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
> unsigned long start = vma->vm_start;
> unsigned long end = vma->vm_end;
> unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
> - pgoff_t next = vma->vm_pgoff;
> - pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
> + pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
> + pgoff_t endpg = next + hugepages;
>
> pagevec_init(&pvec, 0);
> while (next < endpg) {
>

This patch fixed my problem.

Thanks very much

Don

2006-01-23 21:34:54

by Adam Litke

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

Aye.

On Mon, 2006-01-23 at 19:51 +0000, Hugh Dickins wrote:
> Thanks a lot for the strace, that indeed helped to track it down.
>
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
>
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
>
>
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file. But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
>
> Signed-off-by: Hugh Dickins <[email protected]>
Acked-By: Adam Litke <[email protected]>
> ---
>
> fs/hugetlbfs/inode.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> --- 2.6.15/fs/hugetlbfs/inode.c 2006-01-03 03:21:10.000000000 +0000
> +++ linux/fs/hugetlbfs/inode.c 2006-01-23 18:39:47.000000000 +0000
> @@ -71,8 +71,8 @@ huge_pages_needed(struct address_space *
> unsigned long start = vma->vm_start;
> unsigned long end = vma->vm_end;
> unsigned long hugepages = (end - start) >> HPAGE_SHIFT;
> - pgoff_t next = vma->vm_pgoff;
> - pgoff_t endpg = next + ((end - start) >> PAGE_SHIFT);
> + pgoff_t next = vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT);
> + pgoff_t endpg = next + hugepages;
>
> pagevec_init(&pvec, 0);
> while (next < endpg) {
>
>
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

2006-01-23 23:52:57

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Can't mlock hugetlb in 2.6.15

On Mon, Jan 23, 2006 at 07:51:51PM +0000, Hugh Dickins wrote:
> This has nothing to do with mlock or MAP_LOCKED - which by the way do
> make more sense in 2.6.15, since they provide a way of prefaulting the
> hugepage area like in earlier releases (now hugepages are being faulted
> in on demand, though never paged out, as Andrew said).
> Please try the patch below, and let us know if it works for you - thanks.
> Looks like we'll need this in 2.6.16-rc-git and 2.6.15-stable.
> 2.6.15's hugepage faulting introduced huge_pages_needed accounting into
> hugetlbfs: to count how many pages are already in cache, for spot check
> on how far a new mapping may be allowed to extend the file. But it's
> muddled: each hugepage found covers HPAGE_SIZE, not PAGE_SIZE. Once
> pages were already in cache, it would overshoot, wrap its hugepages
> count backwards, and so fail a harmless repeat mapping with -ENOMEM.
> Fixes the problem found by Don Dupuis.
> Signed-off-by: Hugh Dickins <[email protected]>

Acked-by: William Irwin <[email protected]>

A unit conversion error, as usual. It's difficult to understand why
such a natural decision as to use only one radix tree entry per
hugepage is so difficult to cope with. If only my eyes had been sharp
enough to catch it on its way in.

Excellent detective work as always. Thanks again, Hugh.


-- wli