2015-05-08 19:34:05

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 0/3] Allow user to request memory to be locked on page fault

mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated. For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).

Avg throughput in MB/s from stream using 1000000 element arrays
Test 4.1-rc2 4.1-rc2+lock-on-fault
Copy: 10,979.08 10,917.34
Scale: 11,094.45 11,023.01
Add: 12,487.29 12,388.65
Triad: 12,505.77 12,418.78

Kernbench optimal load
4.1-rc2 4.1-rc2+lock-on-fault
Elapsed Time 71.046 71.324
User Time 62.117 62.352
System Time 8.926 8.969
Context Switches 14531.9 14542.5
Sleeps 14935.9 14939

Eric B Munson (3):
Add flag to request pages are locked after page fault
Add mlockall flag for locking pages on fault
Add tests for lock on fault

arch/alpha/include/uapi/asm/mman.h | 2 +
arch/mips/include/uapi/asm/mman.h | 2 +
arch/parisc/include/uapi/asm/mman.h | 2 +
arch/powerpc/include/uapi/asm/mman.h | 2 +
arch/sparc/include/uapi/asm/mman.h | 2 +
arch/tile/include/uapi/asm/mman.h | 2 +
arch/xtensa/include/uapi/asm/mman.h | 2 +
include/linux/mm.h | 1 +
include/linux/mman.h | 3 +-
include/uapi/asm-generic/mman.h | 2 +
mm/mlock.c | 13 ++-
mm/mmap.c | 4 +-
mm/swap.c | 3 +-
tools/testing/selftests/vm/Makefile | 8 +-
tools/testing/selftests/vm/lock-on-fault.c | 145 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 +++++++++
tools/testing/selftests/vm/run_vmtests | 23 +++++
17 files changed, 254 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

--
1.9.1


2015-05-08 19:34:18

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 1/3] Add flag to request pages are locked after page fault

The cost of faulting in all memory to be locked can be very high when
working with large mappings. If only portions of the mapping will be
used this can incur a high penalty for locking. This patch introduces
the ability to request that pages are not pre-faulted, but are placed on
the unevictable LRU when they are finally faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/alpha/include/uapi/asm/mman.h | 1 +
arch/mips/include/uapi/asm/mman.h | 1 +
arch/parisc/include/uapi/asm/mman.h | 1 +
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/xtensa/include/uapi/asm/mman.h | 1 +
include/linux/mm.h | 1 +
include/linux/mman.h | 3 ++-
include/uapi/asm-generic/mman.h | 1 +
mm/mmap.c | 4 ++--
mm/swap.c | 3 ++-
12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
#define MAP_NONBLOCK 0x40000 /* do not block on IO */
#define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x100000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x200000 /* Lock pages after they are faulted in, do not prefault */

#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_SYNC 2 /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */

/*
* Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */

#define MS_SYNC 1 /* synchronous memory sync */
#define MS_ASYNC 2 /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */

#endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */


#endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..ec04eaf 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */
#define MAP_HUGETLB 0x4000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x8000 /* Lock pages after they are faulted in, do not prefault */


/*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 201aec0..42d43cc 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x100000 /* Lock pages after they are faulted in, do not prefault */
#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be
* uninitialized */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0755b9f..3e31457 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -126,6 +126,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
#define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */

+#define VM_LOCKONFAULT 0x00001000 /* Lock the pages covered when they are faulted in */
#define VM_LOCKED 0x00002000
#define VM_IO 0x00004000 /* Memory mapped I/O or similar */

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 16373c8..437264b 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -86,7 +86,8 @@ calc_vm_flag_bits(unsigned long flags)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) |
- _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED );
+ _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
+ _calc_vm_trans(flags, MAP_LOCKONFAULT,VM_LOCKONFAULT);
}

unsigned long vm_commit_limit(void);
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index e9fe6fd..fc4e586 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_LOCKONFAULT 0x80000 /* Lock pages after they are faulted in, do not prefault */

/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */

diff --git a/mm/mmap.c b/mm/mmap.c
index bb50cac..ba1a6bf 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1233,7 +1233,7 @@ static inline int mlock_future_check(struct mm_struct *mm,
unsigned long locked, lock_limit;

/* mlock MCL_FUTURE? */
- if (flags & VM_LOCKED) {
+ if (flags & (VM_LOCKED | VM_LOCKONFAULT)) {
locked = len >> PAGE_SHIFT;
locked += mm->locked_vm;
lock_limit = rlimit(RLIMIT_MEMLOCK);
@@ -1301,7 +1301,7 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
vm_flags = calc_vm_prot_bits(prot) | calc_vm_flag_bits(flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

- if (flags & MAP_LOCKED)
+ if (flags & (MAP_LOCKED | MAP_LOCKONFAULT))
if (!can_do_mlock())
return -EPERM;

diff --git a/mm/swap.c b/mm/swap.c
index a7251a8..07c905e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -711,7 +711,8 @@ void lru_cache_add_active_or_unevictable(struct page *page,
{
VM_BUG_ON_PAGE(PageLRU(page), page);

- if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
+ if (likely((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == 0) ||
+ (vma->vm_flags & VM_SPECIAL)) {
SetPageActive(page);
lru_cache_add(page);
return;
--
1.9.1

2015-05-08 19:34:22

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 2/3] Add mlockall flag for locking pages on fault

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

Signed-off-by: Eric B Munson <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
arch/alpha/include/uapi/asm/mman.h | 1 +
arch/mips/include/uapi/asm/mman.h | 1 +
arch/parisc/include/uapi/asm/mman.h | 1 +
arch/powerpc/include/uapi/asm/mman.h | 1 +
arch/sparc/include/uapi/asm/mman.h | 1 +
arch/tile/include/uapi/asm/mman.h | 1 +
arch/xtensa/include/uapi/asm/mman.h | 1 +
include/uapi/asm-generic/mman.h | 1 +
mm/mlock.c | 13 +++++++++----
9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..3120dfb 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@

#define MCL_CURRENT 8192 /* lock all currently mapped pages */
#define MCL_FUTURE 16384 /* lock all additions to address space */
+#define MCL_ON_FAULT 32768 /* lock all pages that are faulted in */

#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 47846a5..82aec3c 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ON_FAULT 4 /* lock all pages that are faulted in */

#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..f4601f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@

#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ON_FAULT 4 /* lock all pages that are faulted in */

#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0a28efc 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@

#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ON_FAULT 0x80000 /* lock all pages that are faulted in */

#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..119be80 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@

#define MCL_CURRENT 0x2000 /* lock all currently mapped pages */
#define MCL_FUTURE 0x4000 /* lock all additions to address space */
+#define MCL_ON_FAULT 0x80000 /* lock all pages that are faulted in */

#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..66ea935 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ON_FAULT 4 /* lock all pages that are faulted in */


#endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 42d43cc..9abcc29 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -75,6 +75,7 @@
*/
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ON_FAULT 4 /* lock all pages that are faulted in */

#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index fc4e586..6ac7a7b 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -18,5 +18,6 @@

#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
+#define MCL_ON_FAULT 4 /* lock all pages that are faulted in */

#endif /* __ASM_GENERIC_MMAN_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index 6fd2cf1..1406835 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -579,7 +579,7 @@ static int do_mlock(unsigned long start, size_t len, int on)

/* Here we know that vma->vm_start <= nstart < vma->vm_end. */

- newflags = vma->vm_flags & ~VM_LOCKED;
+ newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
if (on)
newflags |= VM_LOCKED;

@@ -662,13 +662,17 @@ static int do_mlockall(int flags)
current->mm->def_flags |= VM_LOCKED;
else
current->mm->def_flags &= ~VM_LOCKED;
- if (flags == MCL_FUTURE)
+ if (flags & MCL_ON_FAULT)
+ current->mm->def_flags |= VM_LOCKONFAULT;
+ else
+ current->mm->def_flags &= ~VM_LOCKONFAULT;
+ if (flags == MCL_FUTURE || flags == MCL_ON_FAULT)
goto out;

for (vma = current->mm->mmap; vma ; vma = prev->vm_next) {
vm_flags_t newflags;

- newflags = vma->vm_flags & ~VM_LOCKED;
+ newflags = vma->vm_flags & ~(VM_LOCKED | VM_LOCKONFAULT);
if (flags & MCL_CURRENT)
newflags |= VM_LOCKED;

@@ -685,7 +689,8 @@ SYSCALL_DEFINE1(mlockall, int, flags)
unsigned long lock_limit;
int ret = -EINVAL;

- if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE)))
+ if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ON_FAULT)) ||
+ ((flags & MCL_FUTURE) && (flags & MCL_ON_FAULT)))
goto out;

ret = -EPERM;
--
1.9.1

2015-05-08 19:34:25

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 3/3] Add tests for lock on fault

Test the mmap() flag, the mlockall() flag, and ensure that mlock limits
are respected. Note that the limit test needs to be run a normal user.

Signed-off-by: Eric B Munson <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
tools/testing/selftests/vm/Makefile | 8 +-
tools/testing/selftests/vm/lock-on-fault.c | 145 ++++++++++++++++++++++++++++
tools/testing/selftests/vm/on-fault-limit.c | 47 +++++++++
tools/testing/selftests/vm/run_vmtests | 23 +++++
4 files changed, 222 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index a5ce953..32f3d20 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,7 +1,13 @@
# Makefile for vm selftests

CFLAGS = -Wall
-BINARIES = hugepage-mmap hugepage-shm map_hugetlb thuge-gen hugetlbfstest
+BINARIES = hugepage-mmap
+BINARIES += hugepage-shm
+BINARIES += hugetlbfstest
+BINARIES += lock-on-fault
+BINARIES += map_hugetlb
+BINARIES += on-fault-limit
+BINARIES += thuge-gen
BINARIES += transhuge-stress

all: $(BINARIES)
diff --git a/tools/testing/selftests/vm/lock-on-fault.c b/tools/testing/selftests/vm/lock-on-fault.c
new file mode 100644
index 0000000..e6a9688
--- /dev/null
+++ b/tools/testing/selftests/vm/lock-on-fault.c
@@ -0,0 +1,145 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ON_FAULT
+#define MCL_ON_FAULT 4
+#endif
+
+#define PRESENT_BIT 0x8000000000000000
+#define PFN_MASK 0x007FFFFFFFFFFFFF
+#define UNEVICTABLE_BIT (1UL << 18)
+
+static int check_pageflags(void *map)
+{
+ FILE *file;
+ unsigned long pfn1;
+ unsigned long pfn2;
+ unsigned long offset1;
+ unsigned long offset2;
+ int ret = 1;
+
+ file = fopen("/proc/self/pagemap", "r");
+ if (!file) {
+ perror("fopen");
+ return ret;
+ }
+ offset1 = (unsigned long)map / getpagesize() * sizeof(unsigned long);
+ offset2 = ((unsigned long)map + getpagesize()) / getpagesize() * sizeof(unsigned long);
+ if (fseek(file, offset1, SEEK_SET)) {
+ perror("fseek");
+ goto out;
+ }
+
+ if (fread(&pfn1, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread");
+ goto out;
+ }
+
+ if (fseek(file, offset2, SEEK_SET)) {
+ perror("fseek");
+ goto out;
+ }
+
+ if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread");
+ goto out;
+ }
+
+ /* pfn2 should not be present */
+ if (pfn2 & PRESENT_BIT) {
+ printf("page map says 0x%lx\n", pfn2);
+ printf("present is 0x%lx\n", PRESENT_BIT);
+ goto out;
+ }
+
+ /* pfn1 should be present */
+ if ((pfn1 & PRESENT_BIT) == 0) {
+ printf("page map says 0x%lx\n", pfn1);
+ printf("present is 0x%lx\n", PRESENT_BIT);
+ goto out;
+ }
+
+ pfn1 &= PFN_MASK;
+ fclose(file);
+ file = fopen("/proc/kpageflags", "r");
+ if (!file) {
+ perror("fopen");
+ munmap(map, 2 * getpagesize());
+ return ret;
+ }
+
+ if (fseek(file, pfn1 * sizeof(unsigned long), SEEK_SET)) {
+ perror("fseek");
+ goto out;
+ }
+
+ if (fread(&pfn2, sizeof(unsigned long), 1, file) != 1) {
+ perror("fread");
+ goto out;
+ }
+
+ /* pfn2 now contains the entry from kpageflags for the first page, the
+ * unevictable bit should be set */
+ if ((pfn2 & UNEVICTABLE_BIT) == 0) {
+ printf("kpageflags says 0x%lx\n", pfn2);
+ printf("unevictable is 0x%lx\n", UNEVICTABLE_BIT);
+ goto out;
+ }
+
+ ret = 0;
+
+out:
+ fclose(file);
+ return ret;
+}
+
+static int test_mmap(int flags)
+{
+ int ret = 1;
+ void *map;
+
+ map = mmap(NULL, 2 * getpagesize(), PROT_READ | PROT_WRITE, flags, 0, 0);
+ if (map == MAP_FAILED) {
+ perror("mmap()");
+ return ret;
+ }
+
+ /* Write something into the first page to ensure it is present */
+ *(char *)map = 1;
+
+ ret = check_pageflags(map);
+
+ munmap(map, 2 * getpagesize());
+ return ret;
+}
+
+static int test_mlockall(void)
+{
+ int ret = 1;
+
+ if (mlockall(MCL_ON_FAULT)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ ret = test_mmap(MAP_PRIVATE | MAP_ANONYMOUS);
+ munlockall();
+ return ret;
+}
+
+#ifndef MAP_LOCKONFAULT
+#define MAP_LOCKONFAULT (MAP_HUGETLB << 1)
+#endif
+
+int main(int argc, char **argv)
+{
+ int ret = 0;
+
+ ret += test_mmap(MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKONFAULT);
+ ret += test_mlockall();
+ return ret;
+}
diff --git a/tools/testing/selftests/vm/on-fault-limit.c b/tools/testing/selftests/vm/on-fault-limit.c
new file mode 100644
index 0000000..bd70078
--- /dev/null
+++ b/tools/testing/selftests/vm/on-fault-limit.c
@@ -0,0 +1,47 @@
+#include <sys/mman.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#ifndef MCL_ON_FAULT
+#define MCL_ON_FAULT 4
+#endif
+
+static int test_limit(void)
+{
+ int ret = 1;
+ struct rlimit lims;
+ void *map;
+
+ if (getrlimit(RLIMIT_MEMLOCK, &lims)) {
+ perror("getrlimit");
+ return ret;
+ }
+
+ if (mlockall(MCL_ON_FAULT)) {
+ perror("mlockall");
+ return ret;
+ }
+
+ map = mmap(NULL, 2 * lims.rlim_max, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 0, 0);
+ if (map != MAP_FAILED)
+ printf("mmap should have failed, but didn't\n");
+ else {
+ ret = 0;
+ munmap(map, 2 * lims.rlim_max);
+ }
+
+ munlockall();
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ int ret = 0;
+
+ ret += test_limit();
+ return ret;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index c87b681..c1aecce 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -90,4 +90,27 @@ fi
umount $mnt
rm -rf $mnt
echo $nr_hugepgs > /proc/sys/vm/nr_hugepages
+
+echo "--------------------"
+echo "running lock-on-fault"
+echo "--------------------"
+./lock-on-fault
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
+echo "--------------------"
+echo "running on-fault-limit"
+echo "--------------------"
+sudo -u nobody ./on-fault-limit
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
exit $exitcode
--
1.9.1

2015-05-08 19:42:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:

> mlock() allows a user to control page out of program memory, but this
> comes at the cost of faulting in the entire mapping when it is
> allocated. For large mappings where the entire area is not necessary
> this is not ideal.
>
> This series introduces new flags for mmap() and mlockall() that allow a
> user to specify that the covered are should not be paged out, but only
> after the memory has been used the first time.

Please tell us much much more about the value of these changes: the use
cases, the behavioural improvements and performance results which the
patchset brings to those use cases, etc.

2015-05-08 20:06:17

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
>
> > mlock() allows a user to control page out of program memory, but this
> > comes at the cost of faulting in the entire mapping when it is
> > allocated. For large mappings where the entire area is not necessary
> > this is not ideal.
> >
> > This series introduces new flags for mmap() and mlockall() that allow a
> > user to specify that the covered are should not be paged out, but only
> > after the memory has been used the first time.
>
> Please tell us much much more about the value of these changes: the use
> cases, the behavioural improvements and performance results which the
> patchset brings to those use cases, etc.
>

The primary use case is for mmaping large files read only. The process
knows that some of the data is necessary, but it is unlikely that the
entire file will be needed. The developer only wants to pay the cost to
read the data in once. Unfortunately developer must choose between
allowing the kernel to page in the memory as needed and guaranteeing
that the data will only be read from disk once. The first option runs
the risk of having the memory reclaimed if the system is under memory
pressure, the second forces the memory usage and startup delay when
faulting in the entire file.

I am working on getting startup times with and without this change for
an application, I will post them as soon as I have them.

Eric


Attachments:
(No filename) (1.46 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-08 20:15:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson <[email protected]> wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
>
> > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> >
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated. For large mappings where the entire area is not necessary
> > > this is not ideal.
> > >
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> >
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> >
>
> The primary use case is for mmaping large files read only. The process
> knows that some of the data is necessary, but it is unlikely that the
> entire file will be needed. The developer only wants to pay the cost to
> read the data in once. Unfortunately developer must choose between
> allowing the kernel to page in the memory as needed and guaranteeing
> that the data will only be read from disk once. The first option runs
> the risk of having the memory reclaimed if the system is under memory
> pressure, the second forces the memory usage and startup delay when
> faulting in the entire file.

Why can't the application mmap only those parts of the file which it
wants and mlock those?

> I am working on getting startup times with and without this change for
> an application, I will post them as soon as I have them.

2015-05-11 14:36:26

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson <[email protected]> wrote:
>
> > On Fri, 08 May 2015, Andrew Morton wrote:
> >
> > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > >
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated. For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > >
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > >
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > >
> >
> > The primary use case is for mmaping large files read only. The process
> > knows that some of the data is necessary, but it is unlikely that the
> > entire file will be needed. The developer only wants to pay the cost to
> > read the data in once. Unfortunately developer must choose between
> > allowing the kernel to page in the memory as needed and guaranteeing
> > that the data will only be read from disk once. The first option runs
> > the risk of having the memory reclaimed if the system is under memory
> > pressure, the second forces the memory usage and startup delay when
> > faulting in the entire file.
>
> Why can't the application mmap only those parts of the file which it
> wants and mlock those?

There are a number of problems with this approach. The first is it
presumes the program will know what portions are needed a head of time.
In many cases this is simply not true. The second problem is the number
of syscalls required. With my patches, a single mmap() or mlockall()
call is needed to setup the required locking. Without it, a separate
mmap call must be made for each piece of data that is needed. This also
opens up problems for data that is arranged assuming it is contiguous in
memory. With the single mmap call, the user gets a contiguous VMA
without having to know about it. mmap() with MAP_FIXED could address
the problem, but this introduces a new failure mode of your map
colliding with another that was placed by the kernel.

Another use case for the LOCKONFAULT flag is the security use of
mlock(). If an application will be using data that cannot be written
to swap, but the exact size is unknown until run time (all we have a
build time is the maximum size the buffer can be). The LOCKONFAULT flag
allows the developer to create the buffer and guarantee that the
contents are never written to swap without ever consuming more memory
than is actually needed.

>
> > I am working on getting startup times with and without this change for
> > an application, I will post them as soon as I have them.
>


Attachments:
(No filename) (2.95 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-11 18:06:38

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 08 May 2015, Andrew Morton wrote:

> On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
>
> > mlock() allows a user to control page out of program memory, but this
> > comes at the cost of faulting in the entire mapping when it is
> > allocated. For large mappings where the entire area is not necessary
> > this is not ideal.
> >
> > This series introduces new flags for mmap() and mlockall() that allow a
> > user to specify that the covered are should not be paged out, but only
> > after the memory has been used the first time.
>
> Please tell us much much more about the value of these changes: the use
> cases, the behavioural improvements and performance results which the
> patchset brings to those use cases, etc.
>

To illustrate the proposed use case I wrote a quick program that mmaps
a 5GB file which is filled with random data and accesses 150,000 pages
from that mapping. Setup and processing were timed separately to
illustrate the differences between the three tested approaches. the
setup portion is simply the call to mmap, the processing is the
accessing of the various locations in that mapping. The following
values are in milliseconds and are the averages of 20 runs each with a
call to echo 3 > /proc/sys/vm/drop_caches between each run.

The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
Startup average: 9476.506
Processing average: 3.573

The second mapping was simply MAP_PRIVATE but each page was passed to
mlock() before being read:
Startup average: 0.051
Processing average: 721.859

The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
Startup average: 0.084
Processing average: 42.125



Attachments:
(No filename) (1.66 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-11 19:12:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Mon, 11 May 2015 10:36:18 -0400 Eric B Munson <[email protected]> wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
> ...
>
> >
> > Why can't the application mmap only those parts of the file which it
> > wants and mlock those?
>
> There are a number of problems with this approach. The first is it
> presumes the program will know what portions are needed a head of time.
> In many cases this is simply not true. The second problem is the number
> of syscalls required. With my patches, a single mmap() or mlockall()
> call is needed to setup the required locking. Without it, a separate
> mmap call must be made for each piece of data that is needed. This also
> opens up problems for data that is arranged assuming it is contiguous in
> memory. With the single mmap call, the user gets a contiguous VMA
> without having to know about it. mmap() with MAP_FIXED could address
> the problem, but this introduces a new failure mode of your map
> colliding with another that was placed by the kernel.
>
> Another use case for the LOCKONFAULT flag is the security use of
> mlock(). If an application will be using data that cannot be written
> to swap, but the exact size is unknown until run time (all we have a
> build time is the maximum size the buffer can be). The LOCKONFAULT flag
> allows the developer to create the buffer and guarantee that the
> contents are never written to swap without ever consuming more memory
> than is actually needed.

What application(s) or class of applications are we talking about here?

IOW, how generally applicable is this? It sounds rather specialized.

2015-05-11 21:05:42

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Mon, 11 May 2015, Andrew Morton wrote:

> On Mon, 11 May 2015 10:36:18 -0400 Eric B Munson <[email protected]> wrote:
>
> > On Fri, 08 May 2015, Andrew Morton wrote:
> > ...
> >
> > >
> > > Why can't the application mmap only those parts of the file which it
> > > wants and mlock those?
> >
> > There are a number of problems with this approach. The first is it
> > presumes the program will know what portions are needed a head of time.
> > In many cases this is simply not true. The second problem is the number
> > of syscalls required. With my patches, a single mmap() or mlockall()
> > call is needed to setup the required locking. Without it, a separate
> > mmap call must be made for each piece of data that is needed. This also
> > opens up problems for data that is arranged assuming it is contiguous in
> > memory. With the single mmap call, the user gets a contiguous VMA
> > without having to know about it. mmap() with MAP_FIXED could address
> > the problem, but this introduces a new failure mode of your map
> > colliding with another that was placed by the kernel.
> >
> > Another use case for the LOCKONFAULT flag is the security use of
> > mlock(). If an application will be using data that cannot be written
> > to swap, but the exact size is unknown until run time (all we have a
> > build time is the maximum size the buffer can be). The LOCKONFAULT flag
> > allows the developer to create the buffer and guarantee that the
> > contents are never written to swap without ever consuming more memory
> > than is actually needed.
>
> What application(s) or class of applications are we talking about here?
>
> IOW, how generally applicable is this? It sounds rather specialized.
>

For the example of a large file, this is the usage pattern for a large
statical language model (probably applies to other statical or graphical
models as well). For the security example, any application transacting
in data that cannot be swapped out (credit card data, medical records,
etc).


Attachments:
(No filename) (1.97 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 13:58:13

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri 08-05-15 16:06:10, Eric B Munson wrote:
> On Fri, 08 May 2015, Andrew Morton wrote:
>
> > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> >
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated. For large mappings where the entire area is not necessary
> > > this is not ideal.
> > >
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> >
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> >
>
> The primary use case is for mmaping large files read only. The process
> knows that some of the data is necessary, but it is unlikely that the
> entire file will be needed. The developer only wants to pay the cost to
> read the data in once. Unfortunately developer must choose between
> allowing the kernel to page in the memory as needed and guaranteeing
> that the data will only be read from disk once. The first option runs
> the risk of having the memory reclaimed if the system is under memory
> pressure, the second forces the memory usage and startup delay when
> faulting in the entire file.

Is there any reason you cannot do this from the userspace? Start by
mmap(PROT_NONE) and do mmap(MAP_FIXED|MAP_LOCKED|MAP_READ|other_flags_you_need)
from the SIGSEGV handler?
You can generate a lot of vmas that way but you can mitigate that to a
certain level by mapping larger than PAGE_SIZE chunks in the fault
handler. Would that work in your usecase?
--
Michal Hocko
SUSE Labs

2015-05-13 14:14:48

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Wed, 13 May 2015, Michal Hocko wrote:

> On Fri 08-05-15 16:06:10, Eric B Munson wrote:
> > On Fri, 08 May 2015, Andrew Morton wrote:
> >
> > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > >
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated. For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > >
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > >
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > >
> >
> > The primary use case is for mmaping large files read only. The process
> > knows that some of the data is necessary, but it is unlikely that the
> > entire file will be needed. The developer only wants to pay the cost to
> > read the data in once. Unfortunately developer must choose between
> > allowing the kernel to page in the memory as needed and guaranteeing
> > that the data will only be read from disk once. The first option runs
> > the risk of having the memory reclaimed if the system is under memory
> > pressure, the second forces the memory usage and startup delay when
> > faulting in the entire file.
>
> Is there any reason you cannot do this from the userspace? Start by
> mmap(PROT_NONE) and do mmap(MAP_FIXED|MAP_LOCKED|MAP_READ|other_flags_you_need)
> from the SIGSEGV handler?
> You can generate a lot of vmas that way but you can mitigate that to a
> certain level by mapping larger than PAGE_SIZE chunks in the fault
> handler. Would that work in your usecase?

This might work for the use cases I have laid out (I am not sure about
the anonymous mmap one, but I will try it). I am concerned about how
much memory management policy these suggestions push into userspace.
I am also concerned about the number of system calls required to do the
same thing. This will require a new call to mmap() for every new page
accessed in the file (or for every file_size/map_size in the multiple
page chunk). The simple case of calling mlock() on the every time the
file was accessed was significantly slower than the LOCKONFAULT flag.
Your suggestion will be better in that it avoids the extra mlock call
for pages already locked, but there still significantly more system
calls. I will add this to the program I have been using to measure
executuion times and see how it compares to the other options.

Eric


Attachments:
(No filename) (2.70 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-13 15:00:42

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Mon, 11 May 2015, Eric B Munson wrote:

> On Fri, 08 May 2015, Andrew Morton wrote:
>
> > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> >
> > > mlock() allows a user to control page out of program memory, but this
> > > comes at the cost of faulting in the entire mapping when it is
> > > allocated. For large mappings where the entire area is not necessary
> > > this is not ideal.
> > >
> > > This series introduces new flags for mmap() and mlockall() that allow a
> > > user to specify that the covered are should not be paged out, but only
> > > after the memory has been used the first time.
> >
> > Please tell us much much more about the value of these changes: the use
> > cases, the behavioural improvements and performance results which the
> > patchset brings to those use cases, etc.
> >
>
> To illustrate the proposed use case I wrote a quick program that mmaps
> a 5GB file which is filled with random data and accesses 150,000 pages
> from that mapping. Setup and processing were timed separately to
> illustrate the differences between the three tested approaches. the
> setup portion is simply the call to mmap, the processing is the
> accessing of the various locations in that mapping. The following
> values are in milliseconds and are the averages of 20 runs each with a
> call to echo 3 > /proc/sys/vm/drop_caches between each run.
>
> The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> Startup average: 9476.506
> Processing average: 3.573
>
> The second mapping was simply MAP_PRIVATE but each page was passed to
> mlock() before being read:
> Startup average: 0.051
> Processing average: 721.859
>
> The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> Startup average: 0.084
> Processing average: 42.125
>

Michal's suggestion of changing protections and locking in a signal
handler was better than the locking as needed, but still significantly
more work required than the LOCKONFAULT case.

Startup average: 0.047
Processing average: 86.431


Attachments:
(No filename) (2.01 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-14 08:08:21

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> On Mon, 11 May 2015, Eric B Munson wrote:
>
> > On Fri, 08 May 2015, Andrew Morton wrote:
> >
> > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > >
> > > > mlock() allows a user to control page out of program memory, but this
> > > > comes at the cost of faulting in the entire mapping when it is
> > > > allocated. For large mappings where the entire area is not necessary
> > > > this is not ideal.
> > > >
> > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > user to specify that the covered are should not be paged out, but only
> > > > after the memory has been used the first time.
> > >
> > > Please tell us much much more about the value of these changes: the use
> > > cases, the behavioural improvements and performance results which the
> > > patchset brings to those use cases, etc.
> > >
> >
> > To illustrate the proposed use case I wrote a quick program that mmaps
> > a 5GB file which is filled with random data and accesses 150,000 pages
> > from that mapping. Setup and processing were timed separately to
> > illustrate the differences between the three tested approaches. the
> > setup portion is simply the call to mmap, the processing is the
> > accessing of the various locations in that mapping. The following
> > values are in milliseconds and are the averages of 20 runs each with a
> > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> >
> > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > Startup average: 9476.506
> > Processing average: 3.573
> >
> > The second mapping was simply MAP_PRIVATE but each page was passed to
> > mlock() before being read:
> > Startup average: 0.051
> > Processing average: 721.859
> >
> > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > Startup average: 0.084
> > Processing average: 42.125
> >
>
> Michal's suggestion of changing protections and locking in a signal
> handler was better than the locking as needed, but still significantly
> more work required than the LOCKONFAULT case.
>
> Startup average: 0.047
> Processing average: 86.431

Have you played with batching? Has it helped? Anyway it is to be
expected that the overhead will be higher than a single mmap call. The
question is whether you can live with it because adding a new semantic
to mlock sounds trickier and MAP_LOCKED is tricky enough already...

--
Michal Hocko
SUSE Labs

2015-05-14 13:58:41

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Thu, 14 May 2015, Michal Hocko wrote:

> On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > On Mon, 11 May 2015, Eric B Munson wrote:
> >
> > > On Fri, 08 May 2015, Andrew Morton wrote:
> > >
> > > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > > >
> > > > > mlock() allows a user to control page out of program memory, but this
> > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > allocated. For large mappings where the entire area is not necessary
> > > > > this is not ideal.
> > > > >
> > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > user to specify that the covered are should not be paged out, but only
> > > > > after the memory has been used the first time.
> > > >
> > > > Please tell us much much more about the value of these changes: the use
> > > > cases, the behavioural improvements and performance results which the
> > > > patchset brings to those use cases, etc.
> > > >
> > >
> > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > from that mapping. Setup and processing were timed separately to
> > > illustrate the differences between the three tested approaches. the
> > > setup portion is simply the call to mmap, the processing is the
> > > accessing of the various locations in that mapping. The following
> > > values are in milliseconds and are the averages of 20 runs each with a
> > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > >
> > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > Startup average: 9476.506
> > > Processing average: 3.573
> > >
> > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > mlock() before being read:
> > > Startup average: 0.051
> > > Processing average: 721.859
> > >
> > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > Startup average: 0.084
> > > Processing average: 42.125
> > >
> >
> > Michal's suggestion of changing protections and locking in a signal
> > handler was better than the locking as needed, but still significantly
> > more work required than the LOCKONFAULT case.
> >
> > Startup average: 0.047
> > Processing average: 86.431
>
> Have you played with batching? Has it helped? Anyway it is to be
> expected that the overhead will be higher than a single mmap call. The
> question is whether you can live with it because adding a new semantic
> to mlock sounds trickier and MAP_LOCKED is tricky enough already...
>

The test code I have been using is a pathalogical test case that only
touches pages once and they are fairly far apart.

On the face batching sounds like a good idea, but I have a couple of
questions. In order to batch fault in pages the seg fault handler needs
to know about the mapping in question. Specifically it needs to know
where it ends so that it doesn't try and mprotect()/mlock() past the
end. So now the program has to start tracking its maps in some globally
accessible structure and this sounds more like implementing memory
management in userspace. How could this batching be implemented without
requiring the signal handler to know about mapping that is being
accessed? Also, how much memory management policy is it reasonable to
expect user space to implement in these cases?

Eric


Attachments:
(No filename) (3.36 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-15 15:35:55

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Thu, 14 May 2015, Michal Hocko wrote:

> On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > On Mon, 11 May 2015, Eric B Munson wrote:
> >
> > > On Fri, 08 May 2015, Andrew Morton wrote:
> > >
> > > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > > >
> > > > > mlock() allows a user to control page out of program memory, but this
> > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > allocated. For large mappings where the entire area is not necessary
> > > > > this is not ideal.
> > > > >
> > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > user to specify that the covered are should not be paged out, but only
> > > > > after the memory has been used the first time.
> > > >
> > > > Please tell us much much more about the value of these changes: the use
> > > > cases, the behavioural improvements and performance results which the
> > > > patchset brings to those use cases, etc.
> > > >
> > >
> > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > from that mapping. Setup and processing were timed separately to
> > > illustrate the differences between the three tested approaches. the
> > > setup portion is simply the call to mmap, the processing is the
> > > accessing of the various locations in that mapping. The following
> > > values are in milliseconds and are the averages of 20 runs each with a
> > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > >
> > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > Startup average: 9476.506
> > > Processing average: 3.573
> > >
> > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > mlock() before being read:
> > > Startup average: 0.051
> > > Processing average: 721.859
> > >
> > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > Startup average: 0.084
> > > Processing average: 42.125
> > >
> >
> > Michal's suggestion of changing protections and locking in a signal
> > handler was better than the locking as needed, but still significantly
> > more work required than the LOCKONFAULT case.
> >
> > Startup average: 0.047
> > Processing average: 86.431
>
> Have you played with batching? Has it helped? Anyway it is to be
> expected that the overhead will be higher than a single mmap call. The
> question is whether you can live with it because adding a new semantic
> to mlock sounds trickier and MAP_LOCKED is tricky enough already...
>

I reworked the experiment to better cover the batching solution. The
same 5GB data file is used, however instead of 150,000 accesses at
regular intervals, the test program now does 15,000,000 accesses to
random pages in the mapping. The rest of the setup remains the same.

mmap with MAP_LOCKED:
Setup avg: 11821.193
Processing avg: 3404.286

mmap with mlock() before each access:
Setup avg: 0.054
Processing avg: 34263.201

mmap with PROT_NONE and signal handler and batch size of 1 page:
With the default value in max_map_count, this gets ENOMEM as I attempt
to change the permissions, after upping the sysctl significantly I get:
Setup avg: 0.050
Processing avg: 67690.625

mmap with PROT_NONE and signal handler and batch size of 8 pages:
Setup avg: 0.098
Processing avg: 37344.197

mmap with PROT_NONE and signal handler and batch size of 16 pages:
Setup avg: 0.0548
Processing avg: 29295.669

mmap with MAP_LOCKONFAULT:
Setup avg: 0.073
Processing avg: 18392.136

The signal handler in the batch cases faulted in memory in two steps to
avoid having to know the start and end of the faulting mapping. The
first step covers the page that caused the fault as we know that it will
be possible to lock. The second step speculatively tries to mlock and
mprotect the batch size - 1 pages that follow. There may be a clever
way to avoid this without having the program track each mapping to be
covered by this handeler in a globally accessible structure, but I could
not find it.

These results show that if the developer knows that a majority of the
mapping will be used, it is better to try and fault it in at once,
otherwise MAP_LOCKONFAULT is significantly faster.

Eric


Attachments:
(No filename) (4.24 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments

2015-05-19 20:30:23

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

On Fri, 15 May 2015, Eric B Munson wrote:

> On Thu, 14 May 2015, Michal Hocko wrote:
>
> > On Wed 13-05-15 11:00:36, Eric B Munson wrote:
> > > On Mon, 11 May 2015, Eric B Munson wrote:
> > >
> > > > On Fri, 08 May 2015, Andrew Morton wrote:
> > > >
> > > > > On Fri, 8 May 2015 15:33:43 -0400 Eric B Munson <[email protected]> wrote:
> > > > >
> > > > > > mlock() allows a user to control page out of program memory, but this
> > > > > > comes at the cost of faulting in the entire mapping when it is
> > > > > > allocated. For large mappings where the entire area is not necessary
> > > > > > this is not ideal.
> > > > > >
> > > > > > This series introduces new flags for mmap() and mlockall() that allow a
> > > > > > user to specify that the covered are should not be paged out, but only
> > > > > > after the memory has been used the first time.
> > > > >
> > > > > Please tell us much much more about the value of these changes: the use
> > > > > cases, the behavioural improvements and performance results which the
> > > > > patchset brings to those use cases, etc.
> > > > >
> > > >
> > > > To illustrate the proposed use case I wrote a quick program that mmaps
> > > > a 5GB file which is filled with random data and accesses 150,000 pages
> > > > from that mapping. Setup and processing were timed separately to
> > > > illustrate the differences between the three tested approaches. the
> > > > setup portion is simply the call to mmap, the processing is the
> > > > accessing of the various locations in that mapping. The following
> > > > values are in milliseconds and are the averages of 20 runs each with a
> > > > call to echo 3 > /proc/sys/vm/drop_caches between each run.
> > > >
> > > > The first mapping was made with MAP_PRIVATE | MAP_LOCKED as a baseline:
> > > > Startup average: 9476.506
> > > > Processing average: 3.573
> > > >
> > > > The second mapping was simply MAP_PRIVATE but each page was passed to
> > > > mlock() before being read:
> > > > Startup average: 0.051
> > > > Processing average: 721.859
> > > >
> > > > The final mapping was MAP_PRIVATE | MAP_LOCKONFAULT:
> > > > Startup average: 0.084
> > > > Processing average: 42.125
> > > >
> > >
> > > Michal's suggestion of changing protections and locking in a signal
> > > handler was better than the locking as needed, but still significantly
> > > more work required than the LOCKONFAULT case.
> > >
> > > Startup average: 0.047
> > > Processing average: 86.431
> >
> > Have you played with batching? Has it helped? Anyway it is to be
> > expected that the overhead will be higher than a single mmap call. The
> > question is whether you can live with it because adding a new semantic
> > to mlock sounds trickier and MAP_LOCKED is tricky enough already...
> >
>
> I reworked the experiment to better cover the batching solution. The
> same 5GB data file is used, however instead of 150,000 accesses at
> regular intervals, the test program now does 15,000,000 accesses to
> random pages in the mapping. The rest of the setup remains the same.
>
> mmap with MAP_LOCKED:
> Setup avg: 11821.193
> Processing avg: 3404.286
>
> mmap with mlock() before each access:
> Setup avg: 0.054
> Processing avg: 34263.201
>
> mmap with PROT_NONE and signal handler and batch size of 1 page:
> With the default value in max_map_count, this gets ENOMEM as I attempt
> to change the permissions, after upping the sysctl significantly I get:
> Setup avg: 0.050
> Processing avg: 67690.625
>
> mmap with PROT_NONE and signal handler and batch size of 8 pages:
> Setup avg: 0.098
> Processing avg: 37344.197
>
> mmap with PROT_NONE and signal handler and batch size of 16 pages:
> Setup avg: 0.0548
> Processing avg: 29295.669
>
> mmap with MAP_LOCKONFAULT:
> Setup avg: 0.073
> Processing avg: 18392.136
>
> The signal handler in the batch cases faulted in memory in two steps to
> avoid having to know the start and end of the faulting mapping. The
> first step covers the page that caused the fault as we know that it will
> be possible to lock. The second step speculatively tries to mlock and
> mprotect the batch size - 1 pages that follow. There may be a clever
> way to avoid this without having the program track each mapping to be
> covered by this handeler in a globally accessible structure, but I could
> not find it.
>
> These results show that if the developer knows that a majority of the
> mapping will be used, it is better to try and fault it in at once,
> otherwise MAP_LOCKONFAULT is significantly faster.
>
> Eric

Is there anything else I can add to the discussion here?


Attachments:
(No filename) (4.54 kB)
signature.asc (819.00 B)
Digital signature
Download all attachments