2015-12-13 20:17:29

by Cyrill Gorcunov

[permalink] [raw]
Subject: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

When inspecting a vague code inside prctl(PR_SET_MM_MEM)
call (which testing the RLIMIT_DATA value to figure out
if we're allowed to assign new @start_brk, @brk, @start_data,
@end_data from mm_struct) it's been commited that RLIMIT_DATA
in a form it's implemented now doesn't do anything useful
because most of user-space libraries use mmap() syscall
for dynamic memory allocations.

So Linus suggested to convert RLIMIT_DATA rlimit into something
suitable for anonymous memory accounting. Here we introduce
new @anon_vm member into mm descriptor which is updated
on every vm_stat_account() call.

Tests for RLIMIT_DATA are done in three places:
- mmap_region: when user calls mmap() helper
- do_brk: for brk() helper
- vma_to_resize: when mremap() is used

The do_brk() is also used in vm_brk() which is called
when the kernel loads Elf and aout files to execute.

Because test for limit is done in do_brk helper we
no longer need to call check_data_rlimit here.

v2:
- update doc
- add may_expand_anon_vm helper
- call for RLIMIT_DATA test in mremap and do_brk

CC: Quentin Casasnovas <[email protected]>
CC: Vegard Nossum <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Willy Tarreau <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Kees Cook <[email protected]>
CC: Vladimir Davydov <[email protected]>
CC: Konstantin Khlebnikov <[email protected]>
CC: Pavel Emelyanov <[email protected]>
CC: Vladimir Davydov <[email protected]>
CC: Peter Zijlstra <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
---
Documentation/filesystems/proc.txt | 1
fs/proc/task_mmu.c | 2 +
include/linux/mm_types.h | 1
mm/mmap.c | 46 ++++++++++++++++++++++---------------
mm/mremap.c | 5 ++++
5 files changed, 37 insertions(+), 18 deletions(-)

Index: linux-ml.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-ml.git.orig/Documentation/filesystems/proc.txt
+++ linux-ml.git/Documentation/filesystems/proc.txt
@@ -233,6 +233,7 @@ Table 1-2: Contents of the status files
VmHWM peak resident set size ("high water mark")
VmRSS size of memory portions
VmData size of data, stack, and text segments
+ VmAnon size of anonymous private memory (not file backended)
VmStk size of data, stack, and text segments
VmExe size of text segment
VmLib size of shared library code
Index: linux-ml.git/fs/proc/task_mmu.c
===================================================================
--- linux-ml.git.orig/fs/proc/task_mmu.c
+++ linux-ml.git/fs/proc/task_mmu.c
@@ -53,6 +53,7 @@ void task_mem(struct seq_file *m, struct
"VmHWM:\t%8lu kB\n"
"VmRSS:\t%8lu kB\n"
"VmData:\t%8lu kB\n"
+ "VmAnon:\t%8lu kB\n"
"VmStk:\t%8lu kB\n"
"VmExe:\t%8lu kB\n"
"VmLib:\t%8lu kB\n"
@@ -66,6 +67,7 @@ void task_mem(struct seq_file *m, struct
hiwater_rss << (PAGE_SHIFT-10),
total_rss << (PAGE_SHIFT-10),
data << (PAGE_SHIFT-10),
+ mm->anon_vm << (PAGE_SHIFT-10),
mm->stack_vm << (PAGE_SHIFT-10), text, lib,
ptes >> 10,
pmds >> 10,
Index: linux-ml.git/include/linux/mm_types.h
===================================================================
--- linux-ml.git.orig/include/linux/mm_types.h
+++ linux-ml.git/include/linux/mm_types.h
@@ -429,6 +429,7 @@ struct mm_struct {
unsigned long shared_vm; /* Shared pages (files) */
unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
unsigned long stack_vm; /* VM_GROWSUP/DOWN */
+ unsigned long anon_vm; /* Anonymous pages mapped */
unsigned long def_flags;
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
Index: linux-ml.git/mm/mmap.c
===================================================================
--- linux-ml.git.orig/mm/mmap.c
+++ linux-ml.git/mm/mmap.c
@@ -309,16 +309,6 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
if (brk < min_brk)
goto out;

- /*
- * Check against rlimit here. If this check is done later after the test
- * of oldbrk with newbrk then it can escape the test and let the data
- * segment grow beyond its set limit the in case where the limit is
- * not page aligned -Ram Gupta
- */
- if (check_data_rlimit(rlimit(RLIMIT_DATA), brk, mm->start_brk,
- mm->end_data, mm->start_data))
- goto out;
-
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk == newbrk)
@@ -1223,6 +1213,9 @@ void vm_stat_account(struct mm_struct *m
mm->exec_vm += pages;
} else if (flags & stack_flags)
mm->stack_vm += pages;
+
+ if (anon_accountable_mapping(file, flags))
+ mm->anon_vm += pages;
}
#endif /* CONFIG_PROC_FS */

@@ -1578,6 +1571,14 @@ unsigned long mmap_region(struct file *f
}

/*
+ * For anon mappings make sure we don't exceed the limit.
+ */
+ if (anon_accountable_mapping(file, vm_flags)) {
+ if (!may_expand_anon_vm(mm, len >> PAGE_SHIFT))
+ return -ENOMEM;
+ }
+
+ /*
* Can we just expand an old mapping?
*/
vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
@@ -2760,7 +2761,8 @@ static unsigned long do_brk(unsigned lon
}

/* Check against address space limits *after* clearing old maps... */
- if (!may_expand_vm(mm, len >> PAGE_SHIFT))
+ if (!may_expand_vm(mm, len >> PAGE_SHIFT) ||
+ !may_expand_anon_vm(mm, len >> PAGE_SHIFT))
return -ENOMEM;

if (mm->map_count > sysctl_max_map_count)
@@ -2795,6 +2797,7 @@ static unsigned long do_brk(unsigned lon
out:
perf_event_mmap(vma);
mm->total_vm += len >> PAGE_SHIFT;
+ mm->anon_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
vma->vm_flags |= VM_SOFTDIRTY;
@@ -2982,20 +2985,27 @@ out:
return NULL;
}

+static inline int __may_expand_vm(unsigned int limit,
+ unsigned long cur,
+ unsigned long npages)
+{
+ unsigned long lim = rlimit(limit) >> PAGE_SHIFT;
+
+ return ((cur + npages) > lim) ? 0 : 1;
+}
+
/*
* Return true if the calling process may expand its vm space by the passed
* number of pages
*/
int may_expand_vm(struct mm_struct *mm, unsigned long npages)
{
- unsigned long cur = mm->total_vm; /* pages */
- unsigned long lim;
-
- lim = rlimit(RLIMIT_AS) >> PAGE_SHIFT;
+ return __may_expand_vm(RLIMIT_AS, mm->total_vm, npages);
+}

- if (cur + npages > lim)
- return 0;
- return 1;
+int may_expand_anon_vm(struct mm_struct *mm, unsigned long npages)
+{
+ return __may_expand_vm(RLIMIT_DATA, mm->anon_vm, npages);
}

static int special_mapping_fault(struct vm_area_struct *vma,
Index: linux-ml.git/mm/mremap.c
===================================================================
--- linux-ml.git.orig/mm/mremap.c
+++ linux-ml.git/mm/mremap.c
@@ -382,6 +382,11 @@ static struct vm_area_struct *vma_to_res
if (!may_expand_vm(mm, (new_len - old_len) >> PAGE_SHIFT))
return ERR_PTR(-ENOMEM);

+ if (anon_accountable_mapping(vma->vm_file, vma->vm_flags)) {
+ if (!may_expand_anon_vm(mm, (new_len - old_len) >> PAGE_SHIFT))
+ return ERR_PTR(-ENOMEM);
+ }
+
if (vma->vm_flags & VM_ACCOUNT) {
unsigned long charged = (new_len - old_len) >> PAGE_SHIFT;
if (security_vm_enough_memory_mm(mm, charged))


2015-12-14 08:12:05

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

On Sun, Dec 13, 2015 at 11:14 PM, Cyrill Gorcunov <[email protected]> wrote:
> When inspecting a vague code inside prctl(PR_SET_MM_MEM)
> call (which testing the RLIMIT_DATA value to figure out
> if we're allowed to assign new @start_brk, @brk, @start_data,
> @end_data from mm_struct) it's been commited that RLIMIT_DATA
> in a form it's implemented now doesn't do anything useful
> because most of user-space libraries use mmap() syscall
> for dynamic memory allocations.
>
> So Linus suggested to convert RLIMIT_DATA rlimit into something
> suitable for anonymous memory accounting. Here we introduce
> new @anon_vm member into mm descriptor which is updated
> on every vm_stat_account() call.

I dont like this part. VmData already here but you add new VmAnon
which has even more vague meaning. And after that RLIMIT_DATA limits
VmAnon not VmData.

I have alternative solution. have a look. (patch follows)

>
> Tests for RLIMIT_DATA are done in three places:
> - mmap_region: when user calls mmap() helper
> - do_brk: for brk() helper
> - vma_to_resize: when mremap() is used
>
> The do_brk() is also used in vm_brk() which is called
> when the kernel loads Elf and aout files to execute.
>
> Because test for limit is done in do_brk helper we
> no longer need to call check_data_rlimit here.
>
> v2:
> - update doc
> - add may_expand_anon_vm helper
> - call for RLIMIT_DATA test in mremap and do_brk
>
> CC: Quentin Casasnovas <[email protected]>
> CC: Vegard Nossum <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Willy Tarreau <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Vladimir Davydov <[email protected]>
> CC: Konstantin Khlebnikov <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Vladimir Davydov <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 1
> fs/proc/task_mmu.c | 2 +
> include/linux/mm_types.h | 1
> mm/mmap.c | 46 ++++++++++++++++++++++---------------
> mm/mremap.c | 5 ++++
> 5 files changed, 37 insertions(+), 18 deletions(-)
>
> Index: linux-ml.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> +++ linux-ml.git/Documentation/filesystems/proc.txt
> @@ -233,6 +233,7 @@ Table 1-2: Contents of the status files
> VmHWM peak resident set size ("high water mark")
> VmRSS size of memory portions
> VmData size of data, stack, and text segments
> + VmAnon size of anonymous private memory (not file backended)
> VmStk size of data, stack, and text segments
> VmExe size of text segment
> VmLib size of shared library code
> Index: linux-ml.git/fs/proc/task_mmu.c
> ===================================================================
> --- linux-ml.git.orig/fs/proc/task_mmu.c
> +++ linux-ml.git/fs/proc/task_mmu.c
> @@ -53,6 +53,7 @@ void task_mem(struct seq_file *m, struct
> "VmHWM:\t%8lu kB\n"
> "VmRSS:\t%8lu kB\n"
> "VmData:\t%8lu kB\n"
> + "VmAnon:\t%8lu kB\n"
> "VmStk:\t%8lu kB\n"
> "VmExe:\t%8lu kB\n"
> "VmLib:\t%8lu kB\n"
> @@ -66,6 +67,7 @@ void task_mem(struct seq_file *m, struct
> hiwater_rss << (PAGE_SHIFT-10),
> total_rss << (PAGE_SHIFT-10),
> data << (PAGE_SHIFT-10),
> + mm->anon_vm << (PAGE_SHIFT-10),
> mm->stack_vm << (PAGE_SHIFT-10), text, lib,
> ptes >> 10,
> pmds >> 10,
> Index: linux-ml.git/include/linux/mm_types.h
> ===================================================================
> --- linux-ml.git.orig/include/linux/mm_types.h
> +++ linux-ml.git/include/linux/mm_types.h
> @@ -429,6 +429,7 @@ struct mm_struct {
> unsigned long shared_vm; /* Shared pages (files) */
> unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
> unsigned long stack_vm; /* VM_GROWSUP/DOWN */
> + unsigned long anon_vm; /* Anonymous pages mapped */
> unsigned long def_flags;
> unsigned long start_code, end_code, start_data, end_data;
> unsigned long start_brk, brk, start_stack;
> Index: linux-ml.git/mm/mmap.c
> ===================================================================
> --- linux-ml.git.orig/mm/mmap.c
> +++ linux-ml.git/mm/mmap.c
> @@ -309,16 +309,6 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
> if (brk < min_brk)
> goto out;
>
> - /*
> - * Check against rlimit here. If this check is done later after the test
> - * of oldbrk with newbrk then it can escape the test and let the data
> - * segment grow beyond its set limit the in case where the limit is
> - * not page aligned -Ram Gupta
> - */
> - if (check_data_rlimit(rlimit(RLIMIT_DATA), brk, mm->start_brk,
> - mm->end_data, mm->start_data))
> - goto out;
> -
> newbrk = PAGE_ALIGN(brk);
> oldbrk = PAGE_ALIGN(mm->brk);
> if (oldbrk == newbrk)
> @@ -1223,6 +1213,9 @@ void vm_stat_account(struct mm_struct *m
> mm->exec_vm += pages;
> } else if (flags & stack_flags)
> mm->stack_vm += pages;
> +
> + if (anon_accountable_mapping(file, flags))
> + mm->anon_vm += pages;
> }
> #endif /* CONFIG_PROC_FS */
>
> @@ -1578,6 +1571,14 @@ unsigned long mmap_region(struct file *f
> }
>
> /*
> + * For anon mappings make sure we don't exceed the limit.
> + */
> + if (anon_accountable_mapping(file, vm_flags)) {
> + if (!may_expand_anon_vm(mm, len >> PAGE_SHIFT))
> + return -ENOMEM;
> + }
> +
> + /*
> * Can we just expand an old mapping?
> */
> vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
> @@ -2760,7 +2761,8 @@ static unsigned long do_brk(unsigned lon
> }
>
> /* Check against address space limits *after* clearing old maps... */
> - if (!may_expand_vm(mm, len >> PAGE_SHIFT))
> + if (!may_expand_vm(mm, len >> PAGE_SHIFT) ||
> + !may_expand_anon_vm(mm, len >> PAGE_SHIFT))
> return -ENOMEM;
>
> if (mm->map_count > sysctl_max_map_count)
> @@ -2795,6 +2797,7 @@ static unsigned long do_brk(unsigned lon
> out:
> perf_event_mmap(vma);
> mm->total_vm += len >> PAGE_SHIFT;
> + mm->anon_vm += len >> PAGE_SHIFT;
> if (flags & VM_LOCKED)
> mm->locked_vm += (len >> PAGE_SHIFT);
> vma->vm_flags |= VM_SOFTDIRTY;
> @@ -2982,20 +2985,27 @@ out:
> return NULL;
> }
>
> +static inline int __may_expand_vm(unsigned int limit,
> + unsigned long cur,
> + unsigned long npages)
> +{
> + unsigned long lim = rlimit(limit) >> PAGE_SHIFT;
> +
> + return ((cur + npages) > lim) ? 0 : 1;
> +}
> +
> /*
> * Return true if the calling process may expand its vm space by the passed
> * number of pages
> */
> int may_expand_vm(struct mm_struct *mm, unsigned long npages)
> {
> - unsigned long cur = mm->total_vm; /* pages */
> - unsigned long lim;
> -
> - lim = rlimit(RLIMIT_AS) >> PAGE_SHIFT;
> + return __may_expand_vm(RLIMIT_AS, mm->total_vm, npages);
> +}
>
> - if (cur + npages > lim)
> - return 0;
> - return 1;
> +int may_expand_anon_vm(struct mm_struct *mm, unsigned long npages)
> +{
> + return __may_expand_vm(RLIMIT_DATA, mm->anon_vm, npages);
> }
>
> static int special_mapping_fault(struct vm_area_struct *vma,
> Index: linux-ml.git/mm/mremap.c
> ===================================================================
> --- linux-ml.git.orig/mm/mremap.c
> +++ linux-ml.git/mm/mremap.c
> @@ -382,6 +382,11 @@ static struct vm_area_struct *vma_to_res
> if (!may_expand_vm(mm, (new_len - old_len) >> PAGE_SHIFT))
> return ERR_PTR(-ENOMEM);
>
> + if (anon_accountable_mapping(vma->vm_file, vma->vm_flags)) {
> + if (!may_expand_anon_vm(mm, (new_len - old_len) >> PAGE_SHIFT))
> + return ERR_PTR(-ENOMEM);
> + }
> +
> if (vma->vm_flags & VM_ACCOUNT) {
> unsigned long charged = (new_len - old_len) >> PAGE_SHIFT;
> if (security_vm_enough_memory_mm(mm, charged))
>

2015-12-14 08:12:45

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH RFC] mm: rework virtual memory accounting

Here several rated changes bundled together:
* keep vma counting if CONFIG_PROC_FS=n, will be used for limits
* replace mm->shared_vm with better defined mm->data_vm
* account anonymous executable areas as executable
* account file-backed growsdown/up areas as stack
* drop struct file* argument from vm_stat_account
* enforce RLIMIT_DATA for size of data areas

This way code looks cleaner: now code/stack/data
classification depends only on vm_flags state:

VM_EXEC & ~VM_WRITE -> code (VmExe + VmLib in proc)
VM_GROWSUP | VM_GROWSDOWN -> stack (VmStk)
VM_WRITE & ~VM_SHARED & !stack -> data (VmData)

The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called "shared",
but that might be strange beasts like readonly-private or VM_IO areas.

RLIMIT_AS limits whole address space "VmSize"
RLIMIT_STACK limits stack "VmStk" (but each vma individually)
RLIMIT_DATA now limits "VmData"

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
arch/ia64/kernel/perfmon.c | 3 +-
fs/proc/task_mmu.c | 7 ++---
include/linux/mm.h | 13 ++-------
include/linux/mm_types.h | 2 +
kernel/fork.c | 5 +--
mm/debug.c | 4 +--
mm/mmap.c | 63 ++++++++++++++++++++++----------------------
mm/mprotect.c | 8 ++++--
mm/mremap.c | 7 +++--
9 files changed, 53 insertions(+), 59 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 60e02f7747ff..9cd607b06964 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2332,8 +2332,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
*/
insert_vm_struct(mm, vma);

- vm_stat_account(vma->vm_mm, vma->vm_flags, vma->vm_file,
- vma_pages(vma));
+ vm_stat_account(vma->vm_mm, vma->vm_flags, vma_pages(vma));
up_write(&task->mm->mmap_sem);

/*
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 187b3b5f242e..58e73af17af9 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -22,7 +22,7 @@

void task_mem(struct seq_file *m, struct mm_struct *mm)
{
- unsigned long data, text, lib, swap, ptes, pmds;
+ unsigned long text, lib, swap, ptes, pmds;
unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;

/*
@@ -39,7 +39,6 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
if (hiwater_rss < mm->hiwater_rss)
hiwater_rss = mm->hiwater_rss;

- data = mm->total_vm - mm->shared_vm - mm->stack_vm;
text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
swap = get_mm_counter(mm, MM_SWAPENTS);
@@ -65,7 +64,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
mm->pinned_vm << (PAGE_SHIFT-10),
hiwater_rss << (PAGE_SHIFT-10),
total_rss << (PAGE_SHIFT-10),
- data << (PAGE_SHIFT-10),
+ mm->data_vm << (PAGE_SHIFT-10),
mm->stack_vm << (PAGE_SHIFT-10), text, lib,
ptes >> 10,
pmds >> 10,
@@ -85,7 +84,7 @@ unsigned long task_statm(struct mm_struct *mm,
*shared = get_mm_counter(mm, MM_FILEPAGES);
*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>> PAGE_SHIFT;
- *data = mm->total_vm - mm->shared_vm;
+ *data = mm->data_vm + mm->stack_vm;
*resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
return mm->total_vm;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad7793788..6816a48b6542 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1898,7 +1898,9 @@ extern void mm_drop_all_locks(struct mm_struct *mm);
extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
extern struct file *get_mm_exe_file(struct mm_struct *mm);

-extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
+extern bool may_expand_vm(struct mm_struct *, vm_flags_t, unsigned long npages);
+extern void vm_stat_account(struct mm_struct *, vm_flags_t, long npages);
+
extern struct vm_area_struct *_install_special_mapping(struct mm_struct *mm,
unsigned long addr, unsigned long len,
unsigned long flags,
@@ -2116,15 +2118,6 @@ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
extern int apply_to_page_range(struct mm_struct *mm, unsigned long address,
unsigned long size, pte_fn_t fn, void *data);

-#ifdef CONFIG_PROC_FS
-void vm_stat_account(struct mm_struct *, unsigned long, struct file *, long);
-#else
-static inline void vm_stat_account(struct mm_struct *mm,
- unsigned long flags, struct file *file, long pages)
-{
- mm->total_vm += pages;
-}
-#endif /* CONFIG_PROC_FS */

#ifdef CONFIG_DEBUG_PAGEALLOC
extern bool _debug_pagealloc_enabled;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492a114f..fa114a255b11 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -426,7 +426,7 @@ struct mm_struct {
unsigned long total_vm; /* Total pages mapped */
unsigned long locked_vm; /* Pages that have PG_mlocked set */
unsigned long pinned_vm; /* Refcount permanently increased */
- unsigned long shared_vm; /* Shared pages (files) */
+ unsigned long data_vm; /* VM_WRITE & ~VM_SHARED/GROWSDOWN */
unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
unsigned long stack_vm; /* VM_GROWSUP/DOWN */
unsigned long def_flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index fce002ee3ddf..1205601c2174 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -413,7 +413,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));

mm->total_vm = oldmm->total_vm;
- mm->shared_vm = oldmm->shared_vm;
+ mm->data_vm = oldmm->data_vm;
mm->exec_vm = oldmm->exec_vm;
mm->stack_vm = oldmm->stack_vm;

@@ -432,8 +432,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
struct file *file;

if (mpnt->vm_flags & VM_DONTCOPY) {
- vm_stat_account(mm, mpnt->vm_flags, mpnt->vm_file,
- -vma_pages(mpnt));
+ vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
continue;
}
charge = 0;
diff --git a/mm/debug.c b/mm/debug.c
index 668aa35191ca..5d2072ed8d5e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -175,7 +175,7 @@ void dump_mm(const struct mm_struct *mm)
"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
"pgd %p mm_users %d mm_count %d nr_ptes %lu nr_pmds %lu map_count %d\n"
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
- "pinned_vm %lx shared_vm %lx exec_vm %lx stack_vm %lx\n"
+ "pinned_vm %lx data_vm %lx exec_vm %lx stack_vm %lx\n"
"start_code %lx end_code %lx start_data %lx end_data %lx\n"
"start_brk %lx brk %lx start_stack %lx\n"
"arg_start %lx arg_end %lx env_start %lx env_end %lx\n"
@@ -209,7 +209,7 @@ void dump_mm(const struct mm_struct *mm)
mm_nr_pmds((struct mm_struct *)mm),
mm->map_count,
mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm,
- mm->pinned_vm, mm->shared_vm, mm->exec_vm, mm->stack_vm,
+ mm->pinned_vm, mm->data_vm, mm->exec_vm, mm->stack_vm,
mm->start_code, mm->end_code, mm->start_data, mm->end_data,
mm->start_brk, mm->brk, mm->start_stack,
mm->arg_start, mm->arg_end, mm->env_start, mm->env_end,
diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..e38f9d4af9c8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1208,24 +1208,6 @@ none:
return NULL;
}

-#ifdef CONFIG_PROC_FS
-void vm_stat_account(struct mm_struct *mm, unsigned long flags,
- struct file *file, long pages)
-{
- const unsigned long stack_flags
- = VM_STACK_FLAGS & (VM_GROWSUP|VM_GROWSDOWN);
-
- mm->total_vm += pages;
-
- if (file) {
- mm->shared_vm += pages;
- if ((flags & (VM_EXEC|VM_WRITE)) == VM_EXEC)
- mm->exec_vm += pages;
- } else if (flags & stack_flags)
- mm->stack_vm += pages;
-}
-#endif /* CONFIG_PROC_FS */
-
/*
* If a hint addr is less than mmap_min_addr change hint to be as
* low as possible but still greater than mmap_min_addr
@@ -1544,7 +1526,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long charged = 0;

/* Check against address space limit. */
- if (!may_expand_vm(mm, len >> PAGE_SHIFT)) {
+ if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
unsigned long nr_pages;

/*
@@ -1556,7 +1538,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

nr_pages = count_vma_pages_range(mm, addr, addr + len);

- if (!may_expand_vm(mm, (len >> PAGE_SHIFT) - nr_pages))
+ if (!may_expand_vm(mm, vm_flags,
+ (len >> PAGE_SHIFT) - nr_pages))
return -ENOMEM;
}

@@ -1655,7 +1638,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
out:
perf_event_mmap(vma);

- vm_stat_account(mm, vm_flags, file, len >> PAGE_SHIFT);
+ vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
if (vm_flags & VM_LOCKED) {
if (!((vm_flags & VM_SPECIAL) || is_vm_hugetlb_page(vma) ||
vma == get_gate_vma(current->mm)))
@@ -2102,7 +2085,7 @@ static int acct_stack_growth(struct vm_area_struct *vma, unsigned long size, uns
unsigned long new_start, actual_size;

/* address space limit tests */
- if (!may_expand_vm(mm, grow))
+ if (!may_expand_vm(mm, vma->vm_flags, grow))
return -ENOMEM;

/* Stack limit test */
@@ -2199,8 +2182,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
spin_lock(&mm->page_table_lock);
if (vma->vm_flags & VM_LOCKED)
mm->locked_vm += grow;
- vm_stat_account(mm, vma->vm_flags,
- vma->vm_file, grow);
+ vm_stat_account(mm, vma->vm_flags, grow);
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_end = address;
anon_vma_interval_tree_post_update_vma(vma);
@@ -2275,8 +2257,7 @@ int expand_downwards(struct vm_area_struct *vma,
spin_lock(&mm->page_table_lock);
if (vma->vm_flags & VM_LOCKED)
mm->locked_vm += grow;
- vm_stat_account(mm, vma->vm_flags,
- vma->vm_file, grow);
+ vm_stat_account(mm, vma->vm_flags, grow);
anon_vma_interval_tree_pre_update_vma(vma);
vma->vm_start = address;
vma->vm_pgoff -= grow;
@@ -2390,7 +2371,7 @@ static void remove_vma_list(struct mm_struct *mm, struct vm_area_struct *vma)

if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += nrpages;
- vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
+ vm_stat_account(mm, vma->vm_flags, -nrpages);
vma = remove_vma(vma);
} while (vma);
vm_unacct_memory(nr_accounted);
@@ -2760,7 +2741,7 @@ static unsigned long do_brk(unsigned long addr, unsigned long len)
}

/* Check against address space limits *after* clearing old maps... */
- if (!may_expand_vm(mm, len >> PAGE_SHIFT))
+ if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
return -ENOMEM;

if (mm->map_count > sysctl_max_map_count)
@@ -2795,6 +2776,7 @@ static unsigned long do_brk(unsigned long addr, unsigned long len)
out:
perf_event_mmap(vma);
mm->total_vm += len >> PAGE_SHIFT;
+ mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
vma->vm_flags |= VM_SOFTDIRTY;
@@ -2986,7 +2968,7 @@ out:
* Return true if the calling process may expand its vm space by the passed
* number of pages
*/
-int may_expand_vm(struct mm_struct *mm, unsigned long npages)
+bool may_expand_vm(struct mm_struct *mm, vm_flags_t flags, unsigned long npages)
{
unsigned long cur = mm->total_vm; /* pages */
unsigned long lim;
@@ -2994,8 +2976,25 @@ int may_expand_vm(struct mm_struct *mm, unsigned long npages)
lim = rlimit(RLIMIT_AS) >> PAGE_SHIFT;

if (cur + npages > lim)
- return 0;
- return 1;
+ return false;
+
+ if ((flags & (VM_WRITE | VM_SHARED | (VM_STACK_FLAGS &
+ (VM_GROWSUP | VM_GROWSDOWN)))) == VM_WRITE)
+ return mm->data_vm + npages <= rlimit(RLIMIT_DATA);
+
+ return true;
+}
+
+void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
+{
+ mm->total_vm += npages;
+
+ if ((flags & (VM_EXEC | VM_WRITE)) == VM_EXEC)
+ mm->exec_vm += npages;
+ else if (flags & (VM_STACK_FLAGS & (VM_GROWSUP | VM_GROWSDOWN)))
+ mm->stack_vm += npages;
+ else if ((flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
+ mm->data_vm += npages;
}

static int special_mapping_fault(struct vm_area_struct *vma,
@@ -3077,7 +3076,7 @@ static struct vm_area_struct *__install_special_mapping(
if (ret)
goto out;

- mm->total_vm += len >> PAGE_SHIFT;
+ vm_stat_account(mm, vma->vm_flags, len >> PAGE_SHIFT);

perf_event_mmap(vma);

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ef5be8eaab00..c764402c464f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -278,6 +278,10 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
* even if read-only so there is no need to account for them here
*/
if (newflags & VM_WRITE) {
+ /* Check space limits when area turns into data. */
+ if (!may_expand_vm(mm, newflags, nrpages) &&
+ may_expand_vm(mm, oldflags, nrpages))
+ return -ENOMEM;
if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_HUGETLB|
VM_SHARED|VM_NORESERVE))) {
charged = nrpages;
@@ -334,8 +338,8 @@ success:
populate_vma_page_range(vma, start, end, NULL);
}

- vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
- vm_stat_account(mm, newflags, vma->vm_file, nrpages);
+ vm_stat_account(mm, oldflags, -nrpages);
+ vm_stat_account(mm, newflags, nrpages);
perf_event_mmap(vma);
return 0;

diff --git a/mm/mremap.c b/mm/mremap.c
index c25bc6268e46..4fe3248426d1 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -317,7 +317,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
* If this were a serious issue, we'd add a flag to do_munmap().
*/
hiwater_vm = mm->hiwater_vm;
- vm_stat_account(mm, vma->vm_flags, vma->vm_file, new_len>>PAGE_SHIFT);
+ vm_stat_account(mm, vma->vm_flags, new_len >> PAGE_SHIFT);

if (do_munmap(mm, old_addr, old_len) < 0) {
/* OOM: unable to split vma, just get accounts right */
@@ -379,7 +379,8 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
return ERR_PTR(-EAGAIN);
}

- if (!may_expand_vm(mm, (new_len - old_len) >> PAGE_SHIFT))
+ if (!may_expand_vm(mm, vma->vm_flags,
+ (new_len - old_len) >> PAGE_SHIFT))
return ERR_PTR(-ENOMEM);

if (vma->vm_flags & VM_ACCOUNT) {
@@ -541,7 +542,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
goto out;
}

- vm_stat_account(mm, vma->vm_flags, vma->vm_file, pages);
+ vm_stat_account(mm, vma->vm_flags, pages);
if (vma->vm_flags & VM_LOCKED) {
mm->locked_vm += pages;
locked = true;

2015-12-14 09:08:35

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [PATCH RFC] mm: rework virtual memory accounting

On Mon, Dec 14, 2015 at 11:12:38AM +0300, Konstantin Khlebnikov wrote:
> Here several rated changes bundled together:
> * keep vma counting if CONFIG_PROC_FS=n, will be used for limits
> * replace mm->shared_vm with better defined mm->data_vm
> * account anonymous executable areas as executable
> * account file-backed growsdown/up areas as stack
> * drop struct file* argument from vm_stat_account
> * enforce RLIMIT_DATA for size of data areas
>
> This way code looks cleaner: now code/stack/data
> classification depends only on vm_flags state:
>
> VM_EXEC & ~VM_WRITE -> code (VmExe + VmLib in proc)
> VM_GROWSUP | VM_GROWSDOWN -> stack (VmStk)
> VM_WRITE & ~VM_SHARED & !stack -> data (VmData)
>
> The rest (VmSize - VmData - VmStk - VmExe - VmLib) could be called "shared",
> but that might be strange beasts like readonly-private or VM_IO areas.
>
> RLIMIT_AS limits whole address space "VmSize"
> RLIMIT_STACK limits stack "VmStk" (but each vma individually)
> RLIMIT_DATA now limits "VmData"
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>

Looks OK to me. Lets wait for Linus' opinion.

2015-12-14 14:51:15

by Quentin Casasnovas

[permalink] [raw]
Subject: Re: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

On Sun, Dec 13, 2015 at 11:14:19PM +0300, Cyrill Gorcunov wrote:
> When inspecting a vague code inside prctl(PR_SET_MM_MEM)
> call (which testing the RLIMIT_DATA value to figure out
> if we're allowed to assign new @start_brk, @brk, @start_data,
> @end_data from mm_struct) it's been commited that RLIMIT_DATA
> in a form it's implemented now doesn't do anything useful
> because most of user-space libraries use mmap() syscall
> for dynamic memory allocations.
>
> So Linus suggested to convert RLIMIT_DATA rlimit into something
> suitable for anonymous memory accounting. Here we introduce
> new @anon_vm member into mm descriptor which is updated
> on every vm_stat_account() call.
>
> Tests for RLIMIT_DATA are done in three places:
> - mmap_region: when user calls mmap() helper
> - do_brk: for brk() helper
> - vma_to_resize: when mremap() is used
>
> The do_brk() is also used in vm_brk() which is called
> when the kernel loads Elf and aout files to execute.
>
> Because test for limit is done in do_brk helper we
> no longer need to call check_data_rlimit here.
>
> v2:
> - update doc
> - add may_expand_anon_vm helper
> - call for RLIMIT_DATA test in mremap and do_brk
>
> CC: Quentin Casasnovas <[email protected]>
> CC: Vegard Nossum <[email protected]>
> CC: Linus Torvalds <[email protected]>
> CC: Willy Tarreau <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Vladimir Davydov <[email protected]>
> CC: Konstantin Khlebnikov <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Vladimir Davydov <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 1
> fs/proc/task_mmu.c | 2 +
> include/linux/mm_types.h | 1
> mm/mmap.c | 46 ++++++++++++++++++++++---------------
> mm/mremap.c | 5 ++++
> 5 files changed, 37 insertions(+), 18 deletions(-)
>
> Index: linux-ml.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> +++ linux-ml.git/Documentation/filesystems/proc.txt
> @@ -233,6 +233,7 @@ Table 1-2: Contents of the status files
> VmHWM peak resident set size ("high water mark")
> VmRSS size of memory portions
> VmData size of data, stack, and text segments
> + VmAnon size of anonymous private memory (not file backended)
> VmStk size of data, stack, and text segments
> VmExe size of text segment
> VmLib size of shared library code
> Index: linux-ml.git/fs/proc/task_mmu.c
> ===================================================================
> --- linux-ml.git.orig/fs/proc/task_mmu.c
> +++ linux-ml.git/fs/proc/task_mmu.c
> @@ -53,6 +53,7 @@ void task_mem(struct seq_file *m, struct
> "VmHWM:\t%8lu kB\n"
> "VmRSS:\t%8lu kB\n"
> "VmData:\t%8lu kB\n"
> + "VmAnon:\t%8lu kB\n"
> "VmStk:\t%8lu kB\n"
> "VmExe:\t%8lu kB\n"
> "VmLib:\t%8lu kB\n"
> @@ -66,6 +67,7 @@ void task_mem(struct seq_file *m, struct
> hiwater_rss << (PAGE_SHIFT-10),
> total_rss << (PAGE_SHIFT-10),
> data << (PAGE_SHIFT-10),
> + mm->anon_vm << (PAGE_SHIFT-10),
> mm->stack_vm << (PAGE_SHIFT-10), text, lib,
> ptes >> 10,
> pmds >> 10,
> Index: linux-ml.git/include/linux/mm_types.h
> ===================================================================
> --- linux-ml.git.orig/include/linux/mm_types.h
> +++ linux-ml.git/include/linux/mm_types.h
> @@ -429,6 +429,7 @@ struct mm_struct {
> unsigned long shared_vm; /* Shared pages (files) */
> unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
> unsigned long stack_vm; /* VM_GROWSUP/DOWN */
> + unsigned long anon_vm; /* Anonymous pages mapped */
> unsigned long def_flags;
> unsigned long start_code, end_code, start_data, end_data;
> unsigned long start_brk, brk, start_stack;
> Index: linux-ml.git/mm/mmap.c
> ===================================================================
> --- linux-ml.git.orig/mm/mmap.c
> +++ linux-ml.git/mm/mmap.c
> @@ -309,16 +309,6 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
> if (brk < min_brk)
> goto out;
>
> - /*
> - * Check against rlimit here. If this check is done later after the test
> - * of oldbrk with newbrk then it can escape the test and let the data
> - * segment grow beyond its set limit the in case where the limit is
> - * not page aligned -Ram Gupta
> - */
> - if (check_data_rlimit(rlimit(RLIMIT_DATA), brk, mm->start_brk,
> - mm->end_data, mm->start_data))
> - goto out;
> -
> newbrk = PAGE_ALIGN(brk);
> oldbrk = PAGE_ALIGN(mm->brk);
> if (oldbrk == newbrk)
> @@ -1223,6 +1213,9 @@ void vm_stat_account(struct mm_struct *m
> mm->exec_vm += pages;
> } else if (flags & stack_flags)
> mm->stack_vm += pages;
> +
> + if (anon_accountable_mapping(file, flags))
> + mm->anon_vm += pages;
> }
> #endif /* CONFIG_PROC_FS */
>
> @@ -1578,6 +1571,14 @@ unsigned long mmap_region(struct file *f
> }
>
> /*
> + * For anon mappings make sure we don't exceed the limit.
> + */
> + if (anon_accountable_mapping(file, vm_flags)) {
> + if (!may_expand_anon_vm(mm, len >> PAGE_SHIFT))
> + return -ENOMEM;
> + }
> +
> + /*
> * Can we just expand an old mapping?
> */
> vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
> @@ -2760,7 +2761,8 @@ static unsigned long do_brk(unsigned lon
> }
>
> /* Check against address space limits *after* clearing old maps... */
> - if (!may_expand_vm(mm, len >> PAGE_SHIFT))
> + if (!may_expand_vm(mm, len >> PAGE_SHIFT) ||
> + !may_expand_anon_vm(mm, len >> PAGE_SHIFT))
> return -ENOMEM;
>
> if (mm->map_count > sysctl_max_map_count)
> @@ -2795,6 +2797,7 @@ static unsigned long do_brk(unsigned lon
> out:
> perf_event_mmap(vma);
> mm->total_vm += len >> PAGE_SHIFT;
> + mm->anon_vm += len >> PAGE_SHIFT;
> if (flags & VM_LOCKED)
> mm->locked_vm += (len >> PAGE_SHIFT);
> vma->vm_flags |= VM_SOFTDIRTY;
> @@ -2982,20 +2985,27 @@ out:
> return NULL;
> }
>
> +static inline int __may_expand_vm(unsigned int limit,
> + unsigned long cur,
> + unsigned long npages)
> +{
> + unsigned long lim = rlimit(limit) >> PAGE_SHIFT;
> +
> + return ((cur + npages) > lim) ? 0 : 1;
> +}
> +
> /*
> * Return true if the calling process may expand its vm space by the passed
> * number of pages
> */
> int may_expand_vm(struct mm_struct *mm, unsigned long npages)
> {
> - unsigned long cur = mm->total_vm; /* pages */
> - unsigned long lim;
> -
> - lim = rlimit(RLIMIT_AS) >> PAGE_SHIFT;
> + return __may_expand_vm(RLIMIT_AS, mm->total_vm, npages);
> +}
>
> - if (cur + npages > lim)
> - return 0;
> - return 1;
> +int may_expand_anon_vm(struct mm_struct *mm, unsigned long npages)
> +{
> + return __may_expand_vm(RLIMIT_DATA, mm->anon_vm, npages);
> }
>

Do we want to fold may_expand_anon_vm() into may_expand_vm() (potentially
passing it the flags/struct file if needed) so there is just one such
helper function? Rationale being that it then gets hard to see what
restricts what, and it's easy to miss one place.

For example, I couldn't find anything preventing a user to
mmap(MAP_GROWSDOWN) and uses that as a base to get pages that would not be
accounted for in your patch (making it a poor-man mremap()).

I only had a quick look so apologies if this is handled and I missed it :)

Quentin

2015-12-14 15:11:22

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

On Mon, Dec 14, 2015 at 03:51:26PM +0100, Quentin Casasnovas wrote:
...
>
> Do we want to fold may_expand_anon_vm() into may_expand_vm() (potentially
> passing it the flags/struct file if needed) so there is just one such
> helper function? Rationale being that it then gets hard to see what
> restricts what, and it's easy to miss one place.

I tried to make the patch small as possible (because otherwise indeed
I would have to pass @vm_file|@file as additional argument). This won't
be a problem but may_expand_vm is called way more times than
may_expand_anon_vm. That's the only rationale I followed.

> For example, I couldn't find anything preventing a user to
> mmap(MAP_GROWSDOWN) and uses that as a base to get pages that would not be
> accounted for in your patch (making it a poor-man mremap()).

growsup/down stand for stack usage iirc, so it was intentionally
not accounted here.

>
> I only had a quick look so apologies if this is handled and I missed it :)

thanks for feedback! also take a look on Kostya's patch, I think it's
even better approach (and I like it more than mine).

2015-12-14 15:32:05

by Quentin Casasnovas

[permalink] [raw]
Subject: Re: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

On Mon, Dec 14, 2015 at 06:11:16PM +0300, Cyrill Gorcunov wrote:
> On Mon, Dec 14, 2015 at 03:51:26PM +0100, Quentin Casasnovas wrote:
> ...
> >
> > Do we want to fold may_expand_anon_vm() into may_expand_vm() (potentially
> > passing it the flags/struct file if needed) so there is just one such
> > helper function? Rationale being that it then gets hard to see what
> > restricts what, and it's easy to miss one place.
>
> I tried to make the patch small as possible (because otherwise indeed
> I would have to pass @vm_file|@file as additional argument). This won't
> be a problem but may_expand_vm is called way more times than
> may_expand_anon_vm. That's the only rationale I followed.
>
> > For example, I couldn't find anything preventing a user to
> > mmap(MAP_GROWSDOWN) and uses that as a base to get pages that would not be
> > accounted for in your patch (making it a poor-man mremap()).
>
> growsup/down stand for stack usage iirc, so it was intentionally
> not accounted here.
>

Right, but in the same vein of Linus saying RLIMIT_DATA is/was useless
because everyone could use mmap() instead of brk() to get anonymous memory,
what's the point of restricting "almost-all" anonymous memory if one can
just use MAP_GROWSDOWN/UP and cause repeated page faults to extend that
mapping, circumventing your checks? That makes the new restriction as
useless as what RLIMIT_DATA used to be, doesn't it?

> >
> > I only had a quick look so apologies if this is handled and I missed it :)
>
> thanks for feedback! also take a look on Kostya's patch, I think it's
> even better approach (and I like it more than mine).

Ha I'm not subscribed to LKML so I missed those, I suppose you can ignore
my comments then! :)

Quentin

2015-12-14 15:43:19

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [RFC 1/2] [RFC] mm: Account anon mappings as RLIMIT_DATA

On Mon, Dec 14, 2015 at 04:32:34PM +0100, Quentin Casasnovas wrote:
> >
> > growsup/down stand for stack usage iirc, so it was intentionally
> > not accounted here.
> >
>
> Right, but in the same vein of Linus saying RLIMIT_DATA is/was useless
> because everyone could use mmap() instead of brk() to get anonymous memory,
> what's the point of restricting "almost-all" anonymous memory if one can
> just use MAP_GROWSDOWN/UP and cause repeated page faults to extend that
> mapping, circumventing your checks? That makes the new restriction as
> useless as what RLIMIT_DATA used to be, doesn't it?

Not as it were before, but true, using growsdown/up will give a way
to allocate memory not limited byt rlimit-data. (Also I just noted
that I modified mm.h as well, where anon_accountable_mapping
was implemented but forgot to add it into quilt, so this patch
on its own won't compile, don't apply it).

> > >
> > > I only had a quick look so apologies if this is handled and I missed it :)
> >
> > thanks for feedback! also take a look on Kostya's patch, I think it's
> > even better approach (and I like it more than mine).
>
> Ha I'm not subscribed to LKML so I missed those, I suppose you can ignore
> my comments then! :)

https://lkml.org/lkml/2015/12/14/72

Take a look.