2011-04-25 10:44:53

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

From: Geunsik Lim <[email protected]>

As we all know, the specification of H/W(cpu, memory, i/o bandwidth, etc) is
different according to their SOC. We can earn a suitable performance(or latency) after
adjust memory unmap size by selecting an optimal value to consider specified system
environment in real world.
In other words, We can get real-fast or real-time using the Linux kernel tunable
parameter choosingly for flexible memory unmap operation unit.

For example, we can get the below effectiveness using this patch
. Reduce a temporal cpu intension(highest cpu usage) when accessing mass files
. Improvement of user responsiveness at embedded products like mobile phone, camcorder, dica
. Get a effective real-time or real-fast at the real world that depend on the physical h/w
. Support sysctl interface(tunalbe parameter) to find a suitable munmap operation unit
at runtime favoringly

unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
a delicate and uncomfortable line between hi-performance and lo-latency.
We have often chosen to improve performance at the expense of latency.

So although there may be no need to reschedule right now,
if we keep on gathering more and more without flushing,
we'll be very unresponsive when a resched is needed later on.

resched is a routine that is called by the current process when rescheduling is to
take place. It is called not only when the time quantum of the current process expires
but also when a blocking(waiting) call such as wait is invoked by the current process
or when a new process of potentially higher priority becomes eligible for execution.

Here are some history about ZAP_BLOCK_SIZE content discussed for scheduling latencies
a long time ago. Hence Ingo's ZAP_BLOCK_SIZE to split it up, small when CONFIG_PREEMPT,
more reasonable but still limited when not.
. Patch subject - [patch] sched, mm: fix scheduling latencies in unmap_vmas()
. LKML archive - http://lkml.org/lkml/2004/9/14/101

Robert Love submitted to get the better latencies by creating a preemption point
at Linux 2.5.28 (development version).
. Patch subject - [PATCH] updated low-latency zap_page_range
. LKML archive - http://lkml.org/lkml/2002/7/24/273

Originally, We aim to not hold locks for too long (for scheduling latency reasons).
So zap pages in ZAP_BLOCK_SIZE byte counts.
This means we need to return the ending mmu_gather to the caller.

In general, This is not a critical latency-path on preemption mode
(PREEMPT_VOLUNTARY / PREEMPT_DESKTOP / PREEMPT_RT)

. Vanilla's preemption mode (mainline kernel tree)
- http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git v2.6.38
1) CONFIG_PREEMPT_NONE: No Forced Preemption (Server)
2) CONFIG_PREEMPT_VOLUNTARY: Voluntary Kernel Preemption (Desktop)
3) CONFIG_PREEMPT: Preemptible Kernel (Low-Latency Desktop)

. Ingo rt patch's preemption mode (-tip kernel tree)
- http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git v2.6.33.9-rt31
1) CONFIG_PREEMPT_NONE
2) CONFIG_PREEMPT_VOLUNTARY
3) CONFIG_PREEMPT + CONFIG_PREEMPT_DESKTOP
4) CONFIG_PREEMPT + CONFIG_PREEMPT_RT + CONFIG_PREEMPT_{SOFTIRQS|HARDIRQS}

This value can be changed at runtime using the
'/proc/sys/vm/munmap_unit_size' as Linux kernel tunable parameter after boot.

* Examples: The size of one page is 4,096bytes.
2048 => 8,388,608bytes : for straight-line efficiency (performance)
1024 => 4,194,304bytes
512 => 2,097,152bytes
256 => 1,048,576bytes
128 => 524,288bytes
64 => 262,144bytes
32 => 131,072bytes
16 => 65,536bytes
8 => 32,768bytes : for low-latency

p.s: I checked parsing of this patch file with './linux-2.6/script/checkpatch.pl' script.
and, I uploaded demo video using youtube about the evaluation result according
to munmap operation unit interface. (http://www.youtube.com/watch?v=PxcgvDTY5F0)

Thanks for reading.

Geunsik Lim (4):
munmap operation size handling
sysctl extension for tunable parameter
kbuild menu for munmap interface
documentation of munmap operation interface

Documentation/sysctl/vm.txt | 36 +++++++++++++++++++
MAINTAINERS | 7 ++++
include/linux/munmap_unit_size.h | 24 +++++++++++++
init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 10 +++++
mm/Makefile | 4 ++-
mm/memory.c | 21 +++++++----
mm/munmap_unit_size.c | 57 +++++++++++++++++++++++++++++++
8 files changed, 221 insertions(+), 8 deletions(-)
create mode 100644 include/linux/munmap_unit_size.h
create mode 100644 mm/munmap_unit_size.c

--
1.7.3.4


2011-04-25 10:44:58

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH 1/4] munmap: mem unmap operation size handling

From: Geunsik Lim <[email protected]>

The specification of H/W(cpu, memory, i/o bandwidth, etc) is different
according to their SOC. We can earn a suitable performance(or latency)
after adjust memory unmap size by selecting an optimal value to consider
specified system environment in real world.

In other words, We can get real-fast or real-time using the Linux kernel
tunable parameter choosingly for flexible memory unmap operation unit.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
---
include/linux/munmap_unit_size.h | 24 ++++++++++++++++
mm/Makefile | 4 ++-
mm/munmap_unit_size.c | 57 ++++++++++++++++++++++++++++++++++++++
3 files changed, 84 insertions(+), 1 deletions(-)
create mode 100644 include/linux/munmap_unit_size.h
create mode 100644 mm/munmap_unit_size.c

diff --git a/include/linux/munmap_unit_size.h b/include/linux/munmap_unit_size.h
new file mode 100644
index 0000000..c4f1fd4
--- /dev/null
+++ b/include/linux/munmap_unit_size.h
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Due to this file being licensed under the GPL there is controversy over
+ * whether this permits you to write a module that #includes this file
+ * without placing your module under the GPL. Please consult a lawyer for
+ * advice before doing this.
+ *
+ */
+
+#ifdef CONFIG_MMU
+extern unsigned long munmap_unit_size;
+extern unsigned long sysctl_munmap_unit_size;
+#else
+#define sysctl_munmap_unit_size 0UL
+#endif
+
+#ifdef CONFIG_MMU
+extern int munmap_unit_size_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index 42a8326..4b55b6c 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -5,7 +5,9 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o pagewalk.o pgtable-generic.o
+ vmalloc.o pagewalk.o pgtable-generic.o \
+ munmap_unit_size.o
+

obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page_alloc.o page-writeback.o \
diff --git a/mm/munmap_unit_size.c b/mm/munmap_unit_size.c
new file mode 100644
index 0000000..1cdae1d
--- /dev/null
+++ b/mm/munmap_unit_size.c
@@ -0,0 +1,57 @@
+/*
+ * Memory Unmap Operation Unit Interface
+ * (C) Geunsik Lim, April 2011
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/munmap_unit_size.h>
+#include <linux/sysctl.h>
+
+/* amount of vm to unmap from userspace access by both Non-preemption mode
+ * and Preemption mode
+ */
+unsigned long munmap_unit_size;
+
+/*
+ * Memory unmap operation unit of vm to release allocated memory size from
+ * userspace using mmap system call
+ */
+#if !defined(CONFIG_PREEMPT_VOLUNTARY) && !defined(CONFIG_PREEMPT)
+unsigned long sysctl_munmap_unit_size = CONFIG_PREEMPT_NO_MUNMAP_RANGE;
+#else
+unsigned long sysctl_munmap_unit_size = CONFIG_PREEMPT_OK_MUNMAP_RANGE;
+#endif
+
+/*
+ * Update munmap_unit_size that changed with /proc/sys/vm/munmap_unit_size
+ * tunable value.
+ */
+static void update_munmap_unit_size(void)
+{
+ munmap_unit_size = sysctl_munmap_unit_size;
+}
+
+/*
+ * sysctl handler which just sets sysctl_munmap_unit_size = the new value
+ * and then calls update_munmap_unit_size()
+ */
+int munmap_unit_size_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+
+ update_munmap_unit_size();
+
+ return ret;
+}
+
+static int __init init_munmap_unit_size(void)
+{
+ update_munmap_unit_size();
+
+ return 0;
+}
+pure_initcall(init_munmap_unit_size);
--
1.7.3.4

2011-04-25 10:45:05

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH 2/4] munmap: sysctl extension for tunable parameter

From: Geunsik Lim <[email protected]>

Support sysctl interface(tunalbe parameter) to find a suitable munmap
operation unit at runtime favoringly

* sysctl: An interface for examining and dynamically changing munmap opearon
size parameters in Linux. In Linux, the sysctl is implemented as
a wrapper around file system routines that access contents of files
in the /proc

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
---
kernel/sysctl.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..9b85041 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,6 +56,7 @@
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/munmap_unit_size.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -1278,6 +1279,15 @@ static struct ctl_table vm_table[] = {
.proc_handler = mmap_min_addr_handler,
},
#endif
+#ifdef CONFIG_MMU
+ {
+ .procname = "munmap_unit_size",
+ .data = &sysctl_munmap_unit_size,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = munmap_unit_size_handler,
+ },
+#endif
#ifdef CONFIG_NUMA
{
.procname = "numa_zonelist_order",
--
1.7.3.4

2011-04-25 10:45:14

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH 4/4] munmap: documentation of munmap operation interface

From: Geunsik Lim <[email protected]>

kernel documentation to utilize flexible memory unmap operation
interface for the ideal scheduler latency.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
---
Documentation/sysctl/vm.txt | 36 ++++++++++++++++++++++++++++++++++++
MAINTAINERS | 7 +++++++
2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 30289fa..9dc4c0a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -40,6 +40,7 @@ Currently, these files are in /proc/sys/vm:
- min_slab_ratio
- min_unmapped_ratio
- mmap_min_addr
+- munmap_unit_size
- nr_hugepages
- nr_overcommit_hugepages
- nr_pdflush_threads
@@ -409,6 +410,41 @@ against future potential kernel bugs.

==============================================================

+munmap_unit_size
+
+unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
+a delicate and uncomfortable line between hi-performance and low-latency.
+We've chosen to improve performance at the expense of latency.
+
+So although there may be no need to resched right now,
+if we keep on gathering more and more without flushing,
+we'll be very unresponsive when a resched is needed later on.
+
+Consider the best suitable result between high performance and low latency
+on preemption mode.
+Select optimal munmap size to return memory space that is allocated by mmap system call.
+
+For example, For recording mass files, if we try to unmap memory that we allocated
+with 100MB for recording in embedded devices, we have to wait for more than 3seconds to
+change mode from play mode to recording mode. This results from the unit of memory
+unmapped size when we are recording mass files like camcorder particularly.
+
+This value can be changed after boot using the
+/proc/sys/vm/munmap_unit_size tunable.
+
+Examples:
+ 2048 => 8,388,608bytes : for straight-line efficiency
+ 1024 => 4,194,304bytes
+ 512 => 2,097,152bytes
+ 256 => 1,048,576bytes
+ 128 => 524,288bytes
+ 64 => 262,144bytes
+ 32 => 131,072bytes
+ 16 => 65,536bytes
+ 8 => 32,768bytes : for low-latency
+
+==============================================================
+
nr_hugepages

Change the minimum size of the hugepage pool.
diff --git a/MAINTAINERS b/MAINTAINERS
index 1380312..07f4123 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4128,6 +4128,13 @@ L: [email protected]
S: Maintained
F: mm/memcontrol.c

+MEMORY UNMAP OPERATION UNIT INTERFACE
+M: Geunsik Lim <[email protected]>
+L: [email protected]
+S: Maintained
+F: mm/munmap_unit_size.c
+F: include/linux/munmap_unit_size.h
+
MEMORY TECHNOLOGY DEVICES (MTD)
M: David Woodhouse <[email protected]>
L: [email protected]
--
1.7.3.4

2011-04-25 10:45:08

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH 3/4] munmap: kbuild menu for munmap interface

From: Geunsik Lim <[email protected]>

Support kbuild menu to select memory unmap operation size
at build time.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
---
init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
mm/memory.c | 21 +++++++++++-----
2 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..0983961 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -557,6 +557,76 @@ config LOG_BUF_SHIFT
13 => 8 KB
12 => 4 KB

+config PREEMPT_OK_MUNMAP_RANGE
+ int "Memory unmap unit on preemption mode (8 => 32KB)"
+ depends on !PREEMPT_NONE
+ range 8 2048
+ default 8
+ help
+ unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
+ a delicate and uncomfortable line between hi-performance and low-latency.
+ We've chosen to improve performance at the expense of latency.
+
+ So although there may be no need to resched right now,
+ if we keep on gathering more and more without flushing,
+ we'll be very unresponsive when a resched is needed later on.
+
+ Consider the best suitable result between high performance and low latency
+ on preemption mode.
+ Select optimal munmap size to return memory space that is allocated by mmap system call.
+
+ For example, For recording mass files, if we try to unmap memory that we allocated
+ with 100MB for recording in embedded devices, we have to wait for more than 3seconds to
+ change mode from play mode to recording mode. This results from the unit of memory
+ unmapped size when we are recording mass files like camcorder particularly.
+
+ This value can be changed after boot using the
+ /proc/sys/vm/munmap_unit_size tunable.
+
+ Examples:
+ 2048 => 8,388,608bytes : for straight-line efficiency
+ 1024 => 4,194,304bytes
+ 512 => 2,097,152bytes
+ 256 => 1,048,576bytes
+ 128 => 524,288bytes
+ 64 => 262,144bytes
+ 32 => 131,072bytes
+ 16 => 65,536bytes
+ 8 => 32,768bytes : for low-latency (*default)
+
+config PREEMPT_NO_MUNMAP_RANGE
+ int "Memory unmap unit on non-preemption mode (1024 => 4MB)"
+ depends on PREEMPT_NONE
+ range 8 2048
+ default 1024
+ help
+
+ unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
+ a delicate and uncomfortable line between hi-performance and low-latency.
+ We've chosen to improve performance at the expense of latency.
+
+ So although there may be no need to resched right now,
+ if we keep on gathering more and more without flushing,
+ we'll be very unresponsive when a resched is needed later on.
+
+ Consider the best suitable result between high performance and low latency
+ on preemption mode.
+ Select optimal munmap size to return memory space that is allocated by mmap system call.
+
+ This value can be changed after boot using the
+ /proc/sys/vm/munmap_unit_size tunable.
+
+ Examples:
+ 2048 => 8,388,608bytes : for straight-line efficiency
+ 1024 => 4,194,304bytes (*default)
+ 512 => 2,097,152bytes
+ 256 => 1,048,576bytes
+ 128 => 524,288bytes
+ 64 => 262,144bytes
+ 32 => 131,072bytes
+ 16 => 65,536bytes
+ 8 => 32,768bytes : for low-latency
+
#
# Architectures with an unreliable sched_clock() should select this:
#
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..e4533fe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -57,6 +57,7 @@
#include <linux/swapops.h>
#include <linux/elf.h>
#include <linux/gfp.h>
+#include <linux/munmap_unit_size.h>

#include <asm/io.h>
#include <asm/pgalloc.h>
@@ -1079,6 +1080,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
(*zap_work)--;
continue;
}
+#if 0
+printk("DEBUG:munmap step2,(%s:%d), unmap range = current(%lu) + \
+zap_work(%lu bytes) \n", current->comm, current->pid, addr, *zap_work);
+#endif
next = zap_pud_range(tlb, vma, pgd, addr, next,
zap_work, details);
} while (pgd++, addr = next, (addr != end && *zap_work > 0));
@@ -1088,12 +1093,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
return addr;
}

-#ifdef CONFIG_PREEMPT
-# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
-#else
-/* No preempt: go for improved straight-line efficiency */
-# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
-#endif
+/* No preempt: go for improved straight-line efficiency
+ * on PREEMPT(preemption mode) this is not a critical latency-path.
+ */
+# define ZAP_BLOCK_SIZE (munmap_unit_size * PAGE_SIZE)

/**
* unmap_vmas - unmap a range of memory covered by a list of vma's
@@ -1133,7 +1136,11 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
int fullmm = (*tlbp)->fullmm;
struct mm_struct *mm = vma->vm_mm;
-
+#if 0
+printk("DEBUG:munmap step1,(%s:%d), unit=zap_work(%ld)/ZAP_BLOCK(%ld), \
+vma:[%8lu]=%lu-%lu \n", current->comm, current->pid, zap_work, ZAP_BLOCK_SIZE, \
+vma->vm_end - vma->vm_start, vma->vm_end, vma->vm_start);
+#endif
mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
unsigned long end;
--
1.7.3.4

2011-04-25 15:31:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Mon, 2011-04-25 at 19:44 +0900, Geunsik Lim wrote:
> From: Geunsik Lim <[email protected]>
>
> Support kbuild menu to select memory unmap operation size
> at build time.

The subject and this line are not quite the same. The subject looks like
it only modifies the kbuild options, not mm/memory.c as well. Please
fix.

>
> Signed-off-by: Geunsik Lim <[email protected]>
> Acked-by: Hyunjin Choi <[email protected]>
> ---
> init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/memory.c | 21 +++++++++++-----
> 2 files changed, 84 insertions(+), 7 deletions(-)
>

> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -57,6 +57,7 @@
> #include <linux/swapops.h>
> #include <linux/elf.h>
> #include <linux/gfp.h>
> +#include <linux/munmap_unit_size.h>
>
> #include <asm/io.h>
> #include <asm/pgalloc.h>
> @@ -1079,6 +1080,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
> (*zap_work)--;
> continue;
> }
> +#if 0
> +printk("DEBUG:munmap step2,(%s:%d), unmap range = current(%lu) + \
> +zap_work(%lu bytes) \n", current->comm, current->pid, addr, *zap_work);
> +#endif

No #if 0 debug printing in mainline.

> next = zap_pud_range(tlb, vma, pgd, addr, next,
> zap_work, details);
> } while (pgd++, addr = next, (addr != end && *zap_work > 0));
> @@ -1088,12 +1093,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
> return addr;
> }
>
> -#ifdef CONFIG_PREEMPT
> -# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
> -#else
> -/* No preempt: go for improved straight-line efficiency */
> -# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
> -#endif
> +/* No preempt: go for improved straight-line efficiency
> + * on PREEMPT(preemption mode) this is not a critical latency-path.
> + */
> +# define ZAP_BLOCK_SIZE (munmap_unit_size * PAGE_SIZE)
>
> /**
> * unmap_vmas - unmap a range of memory covered by a list of vma's
> @@ -1133,7 +1136,11 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
> spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> int fullmm = (*tlbp)->fullmm;
> struct mm_struct *mm = vma->vm_mm;
> -
> +#if 0
> +printk("DEBUG:munmap step1,(%s:%d), unit=zap_work(%ld)/ZAP_BLOCK(%ld), \
> +vma:[%8lu]=%lu-%lu \n", current->comm, current->pid, zap_work, ZAP_BLOCK_SIZE, \
> +vma->vm_end - vma->vm_start, vma->vm_end, vma->vm_start);
> +#endif

Get rid of this too.

Either have pr_debug(...) or nothing at all.

-- Steve

> mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
> for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
> unsigned long end;

2011-04-25 15:46:16

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Mon, 25 Apr 2011 19:44:31 +0900 Geunsik Lim wrote:

> From: Geunsik Lim <[email protected]>
>
> Support kbuild menu to select memory unmap operation size
> at build time.
>
> Signed-off-by: Geunsik Lim <[email protected]>
> Acked-by: Hyunjin Choi <[email protected]>
> ---
> init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/memory.c | 21 +++++++++++-----
> 2 files changed, 84 insertions(+), 7 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 56240e7..0983961 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -557,6 +557,76 @@ config LOG_BUF_SHIFT
> 13 => 8 KB
> 12 => 4 KB
>
> +config PREEMPT_OK_MUNMAP_RANGE
> + int "Memory unmap unit on preemption mode (8 => 32KB)"
> + depends on !PREEMPT_NONE
> + range 8 2048
> + default 8
> + help
> + unmap_vmas(=unmap a range of memory covered by a list of vma) is treading

unmap_vmas (= unmap a range ...

> + a delicate and uncomfortable line between hi-performance and low-latency.

high performane and low latency.

> + We've chosen to improve performance at the expense of latency.

This option improves performance at the expense of latency.

> +
> + So although there may be no need to resched right now,

reschedule

> + if we keep on gathering more and more without flushing,

gathering more and more <what> ?

> + we'll be very unresponsive when a resched is needed later on.

reschedule

> +
> + Consider the best suitable result between high performance and low latency
> + on preemption mode.
> + Select optimal munmap size to return memory space that is allocated by mmap system call.
> +
> + For example, For recording mass files, if we try to unmap memory that we allocated

for

> + with 100MB for recording in embedded devices, we have to wait for more than 3seconds to

3 seconds

(but try not to put text over 80 columns, please)

> + change mode from play mode to recording mode. This results from the unit of memory
> + unmapped size when we are recording mass files like camcorder particularly.
> +
> + This value can be changed after boot using the
> + /proc/sys/vm/munmap_unit_size tunable.

Indent above with tab + 2 spaces.

> +
> + Examples:
> + 2048 => 8,388,608bytes : for straight-line efficiency
> + 1024 => 4,194,304bytes
> + 512 => 2,097,152bytes
> + 256 => 1,048,576bytes
> + 128 => 524,288bytes
> + 64 => 262,144bytes
> + 32 => 131,072bytes
> + 16 => 65,536bytes
> + 8 => 32,768bytes : for low-latency (*default)

All of above would be better with added space before "bytes", as, e.g.:
8 => 32,768 bytes

> +
> +config PREEMPT_NO_MUNMAP_RANGE
> + int "Memory unmap unit on non-preemption mode (1024 => 4MB)"
> + depends on PREEMPT_NONE
> + range 8 2048
> + default 1024
> + help
> +
> + unmap_vmas(=unmap a range of memory covered by a list of vma) is treading

unmap_vmas (= unmap

> + a delicate and uncomfortable line between hi-performance and low-latency.

high performance and low latency.

> + We've chosen to improve performance at the expense of latency.

This option improves performance at the expense of latency.

> +
> + So although there may be no need to resched right now,

reschedule

> + if we keep on gathering more and more without flushing,

more and more what?

> + we'll be very unresponsive when a resched is needed later on.

reschedule

> +
> + Consider the best suitable result between high performance and low latency
> + on preemption mode.

but this option is for non-preempt mode... so should that text above be modified?

> + Select optimal munmap size to return memory space that is allocated by mmap system call.
> +
> + This value can be changed after boot using the
> + /proc/sys/vm/munmap_unit_size tunable.

Indent above with tab + 2 spaces.

> +
> + Examples:
> + 2048 => 8,388,608bytes : for straight-line efficiency
> + 1024 => 4,194,304bytes (*default)
> + 512 => 2,097,152bytes
> + 256 => 1,048,576bytes
> + 128 => 524,288bytes
> + 64 => 262,144bytes
> + 32 => 131,072bytes
> + 16 => 65,536bytes
> + 8 => 32,768bytes : for low-latency

Use space before "bytes" in table above, please.

> +
> #
> # Architectures with an unreliable sched_clock() should select this:
> #


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-04-25 19:51:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Mon, 2011-04-25 at 19:44 +0900, Geunsik Lim wrote:
> Originally, We aim to not hold locks for too long (for scheduling latency reasons).
> So zap pages in ZAP_BLOCK_SIZE byte counts.
> This means we need to return the ending mmu_gather to the caller.

Please have a look at the mmu_gather rewrite that hit -mm last week,
that completely does away with ZAP_BLOCK_SIZE and renders these patches
obsolete.

Also, -rt doesn't care since it already has preemptible mmu_gather.

Furthermore:

> +L: [email protected]

is complete crap, linux-rt-users is _NOT_ a development list.

2011-04-25 20:29:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Mon, 2011-04-25 at 21:47 +0200, Peter Zijlstra wrote:

> Also, -rt doesn't care since it already has preemptible mmu_gather.

To be fair, he did state:

> In general, This is not a critical latency-path on preemption mode
> (PREEMPT_VOLUNTARY / PREEMPT_DESKTOP / PREEMPT_RT)

2011-04-26 00:41:01

by Geunsik Lim

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Tue, Apr 26, 2011 at 12:31 AM, Steven Rostedt <[email protected]> wrote:
> On Mon, 2011-04-25 at 19:44 +0900, Geunsik Lim wrote:
>> From: Geunsik Lim <[email protected]>
>>
>> Support kbuild menu to select memory unmap operation size
>> at build time.
>
> The subject and this line are not quite the same. The subject looks like
> it only modifies the kbuild options, not mm/memory.c as well. Please
> fix.
You are right. I will fix it.
>
>>
>> Signed-off-by: Geunsik Lim <[email protected]>
>> Acked-by: Hyunjin Choi <[email protected]>
>> ---
>>  init/Kconfig |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  mm/memory.c  |   21 +++++++++++-----
>>  2 files changed, 84 insertions(+), 7 deletions(-)
>>
>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -57,6 +57,7 @@
>>  #include <linux/swapops.h>
>>  #include <linux/elf.h>
>>  #include <linux/gfp.h>
>> +#include <linux/munmap_unit_size.h>
>>
>>  #include <asm/io.h>
>>  #include <asm/pgalloc.h>
>> @@ -1079,6 +1080,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>>                       (*zap_work)--;
>>                       continue;
>>               }
>> +#if 0
>> +printk("DEBUG:munmap step2,(%s:%d), unmap range = current(%lu) + \
>> +zap_work(%lu bytes) \n", current->comm, current->pid, addr, *zap_work);
>> +#endif
>
> No #if 0 debug printing in mainline.
Thank you for your advice.
>
>>               next = zap_pud_range(tlb, vma, pgd, addr, next,
>>                                               zap_work, details);
>>       } while (pgd++, addr = next, (addr != end && *zap_work > 0));
>> @@ -1088,12 +1093,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>>       return addr;
>>  }
>>
>> -#ifdef CONFIG_PREEMPT
>> -# define ZAP_BLOCK_SIZE      (8 * PAGE_SIZE)
>> -#else
>> -/* No preempt: go for improved straight-line efficiency */
>> -# define ZAP_BLOCK_SIZE      (1024 * PAGE_SIZE)
>> -#endif
>> +/* No preempt: go for improved straight-line efficiency
>> + * on PREEMPT(preemption mode) this is not a critical latency-path.
>> + */
>> +# define ZAP_BLOCK_SIZE        (munmap_unit_size * PAGE_SIZE)
>>
>>  /**
>>   * unmap_vmas - unmap a range of memory covered by a list of vma's
>> @@ -1133,7 +1136,11 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>>       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>>       int fullmm = (*tlbp)->fullmm;
>>       struct mm_struct *mm = vma->vm_mm;
>> -
>> +#if 0
>> +printk("DEBUG:munmap step1,(%s:%d), unit=zap_work(%ld)/ZAP_BLOCK(%ld), \
>> +vma:[%8lu]=%lu-%lu \n", current->comm, current->pid, zap_work, ZAP_BLOCK_SIZE, \
>> +vma->vm_end - vma->vm_start, vma->vm_end, vma->vm_start);
>> +#endif
>
> Get rid of this too.
>
> Either have pr_debug(...) or nothing at all.
In fact, I wondered about this debug messages for just debugging. :)
Yes. I will remove this debug messages because it is not necessary
as you commented.
>
> -- Steve
>
>>       mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
>>       for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
>>               unsigned long end;
>
>
>



--
Regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
           [email protected] , [email protected]

2011-04-26 00:42:34

by Geunsik Lim

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Tue, Apr 26, 2011 at 12:45 AM, Randy Dunlap <[email protected]> wrote:
> On Mon, 25 Apr 2011 19:44:31 +0900 Geunsik Lim wrote:
>
>> From: Geunsik Lim <[email protected]>
>>
>> Support kbuild menu to select memory unmap operation size
>> at build time.
>>
>> Signed-off-by: Geunsik Lim <[email protected]>
>> Acked-by: Hyunjin Choi <[email protected]>
>> ---
>>  init/Kconfig |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  mm/memory.c  |   21 +++++++++++-----
>>  2 files changed, 84 insertions(+), 7 deletions(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 56240e7..0983961 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -557,6 +557,76 @@ config LOG_BUF_SHIFT
>>                    13 =>  8 KB
>>                    12 =>  4 KB
>>
>> +config PREEMPT_OK_MUNMAP_RANGE
>> +     int "Memory unmap unit on preemption mode (8 => 32KB)"
>> +     depends on !PREEMPT_NONE
>> +     range 8 2048
>> +     default 8
>> +     help
>> +       unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
>
>          unmap_vmas (= unmap a range ...
>
>> +       a delicate and uncomfortable line between hi-performance and low-latency.
>
>                                                    high performane and low latency.
>
>> +       We've chosen to improve performance at the expense of latency.
>
>          This option improves performance at the expense of latency.
>
>> +
>> +       So although there may be no need to resched right now,
>
>                                              reschedule
>
>> +       if we keep on gathering more and more without flushing,
>
>                        gathering more and more <what> ?
>
>> +       we'll be very unresponsive when a resched is needed later on.
>
>                                            reschedule
>
>> +
>> +       Consider the best suitable result between high performance and low latency
>> +       on preemption mode.
>> +       Select optimal munmap size to return memory space that is allocated by mmap system call.
>> +
>> +       For example, For recording mass files, if we try to unmap memory that we allocated
>
>                       for
>
>> +       with 100MB for recording in embedded devices, we have to wait for more than 3seconds to
>
>                                                                                      3 seconds
>
> (but try not to put text over 80 columns, please)
>
>> +       change mode from play mode to recording mode. This results from the unit of memory
>> +       unmapped size when we are recording mass files like camcorder particularly.
>> +
>> +          This value can be changed after boot using the
>> +          /proc/sys/vm/munmap_unit_size tunable.
>
> Indent above with tab + 2 spaces.
>
>> +
>> +       Examples:
>> +                  2048 => 8,388,608bytes : for straight-line efficiency
>> +                  1024 => 4,194,304bytes
>> +                   512 => 2,097,152bytes
>> +                   256 => 1,048,576bytes
>> +                   128 =>   524,288bytes
>> +                    64 =>   262,144bytes
>> +                    32 =>   131,072bytes
>> +                    16 =>    65,536bytes
>> +                     8 =>    32,768bytes : for low-latency (*default)
>
> All of above would be better with added space before "bytes", as, e.g.:
>                        8 =>    32,768 bytes
>
>> +
>> +config PREEMPT_NO_MUNMAP_RANGE
>> +     int "Memory unmap unit on non-preemption mode (1024 => 4MB)"
>> +     depends on PREEMPT_NONE
>> +     range 8 2048
>> +     default 1024
>> +     help
>> +
>> +       unmap_vmas(=unmap a range of memory covered by a list of vma) is treading
>
>          unmap_vmas (= unmap
>
>> +       a delicate and uncomfortable line between hi-performance and low-latency.
>
>                                                    high performance and low latency.
>
>> +       We've chosen to improve performance at the expense of latency.
>
>          This option improves performance at the expense of latency.
>
>> +
>> +       So although there may be no need to resched right now,
>
>                                              reschedule
>
>> +       if we keep on gathering more and more without flushing,
>
>                                  more and more what?
>
>> +       we'll be very unresponsive when a resched is needed later on.
>
>                                            reschedule
>
>> +
>> +       Consider the best suitable result between high performance and low latency
>> +       on preemption mode.
>
> but this option is for non-preempt mode... so should that text above be modified?
>
>> +       Select optimal munmap size to return memory space that is allocated by mmap system call.
>> +
>> +          This value can be changed after boot using the
>> +          /proc/sys/vm/munmap_unit_size tunable.
>
> Indent above with tab + 2 spaces.
>
>> +
>> +       Examples:
>> +                  2048 => 8,388,608bytes : for straight-line efficiency
>> +                  1024 => 4,194,304bytes (*default)
>> +                   512 => 2,097,152bytes
>> +                   256 => 1,048,576bytes
>> +                   128 =>   524,288bytes
>> +                    64 =>   262,144bytes
>> +                    32 =>   131,072bytes
>> +                    16 =>    65,536bytes
>> +                     8 =>    32,768bytes : for low-latency
>
>                Use space before "bytes" in table above, please.
>
>> +
>>  #
>>  # Architectures with an unreliable sched_clock() should select this:
>>  #
>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
Randy Dunlap. Thanks a lot.
I will modify contents that you commented.
>



--
Regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
           [email protected] , [email protected]

2011-04-26 01:21:01

by Geunsik Lim

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Tue, Apr 26, 2011 at 4:47 AM, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2011-04-25 at 19:44 +0900, Geunsik Lim wrote:
>>     Originally, We aim to not hold locks for too long (for scheduling latency reasons).
>>     So zap pages in ZAP_BLOCK_SIZE byte counts.
>>     This means we need to return the ending mmu_gather to the caller.
>
> Please have a look at the mmu_gather rewrite that hit -mm last week,
> that completely does away with ZAP_BLOCK_SIZE and renders these patches
> obsolete.
Yes. I also checked the patch that you stated at LKML mailing list previously.
In my thinking. I want to keep ZAP_BLOCK_SIZE related contents
that adjusted by Ingo, Robert, Andrew, and so on a long time ago
because I believe that we can overcome below problems sufficiently
in real world.
. LKML archive - http://lkml.org/lkml/2002/7/24/273
. LKML archive - http://lkml.org/lkml/2004/9/14/101

In my experience, I did overcome below problems with this patch
based on ZAP_BLOCK_SIZE.

1) To solve temporal CPU contention
(e.g: case that cpu contention is 93% ~ 96% according to mmap/munmap
to access mass files )
2) To get real-time or real-fast selectively on specified linux system
( demo: http://www.youtube.com/watch?v=PxcgvDTY5F0 )


>
> Also, -rt doesn't care since it already has preemptible mmu_gather.
>
> Furthermore:
>
>> +L:     [email protected]
Sorry. I think that I have to add "[email protected]" because this
patch is related to scheduling latencies although this modification
is in ./linux-2.6/mm/ directory. I will remove "+L: linux-rt-users******" if
really linux-rt-users mailing list can not care.
>
> is complete crap, linux-rt-users is _NOT_ a development list.
>
>



--
Regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
           [email protected] , [email protected]

2011-04-26 07:23:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Tue, 2011-04-26 at 10:20 +0900, Geunsik Lim wrote:
> Yes. I also checked the patch that you stated at LKML mailing list previously.
> In my thinking. I want to keep ZAP_BLOCK_SIZE related contents
> that adjusted by Ingo, Robert, Andrew, and so on a long time ago
> because I believe that we can overcome below problems sufficiently
> in real world.
> . LKML archive - http://lkml.org/lkml/2002/7/24/273
> . LKML archive - http://lkml.org/lkml/2004/9/14/101

Real ancient world, that was 2004, well before we grew preemptible
mmu_gather.

> In my experience, I did overcome below problems with this patch
> based on ZAP_BLOCK_SIZE.
>
> 1) To solve temporal CPU contention
> (e.g: case that cpu contention is 93% ~ 96% according to mmap/munmap
> to access mass files )
> 2) To get real-time or real-fast selectively on specified linux system

I still don't get it, what kernel are you targeting here and why?

-RT doesn't care, and clearly PREEMPT=n doesn't care because its not
about latency at all, the only half-way point is PREEMPT=y and for that
you could simply reduce ZAP_BLOCK_SIZE.

Then again, what's the point, simply remove the whole thing (like I did)
and your problem is solved too.

2011-04-26 07:25:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Mon, 2011-04-25 at 16:29 -0400, Steven Rostedt wrote:
> On Mon, 2011-04-25 at 21:47 +0200, Peter Zijlstra wrote:
>
> > Also, -rt doesn't care since it already has preemptible mmu_gather.
>
> To be fair, he did state:
>
> > In general, This is not a critical latency-path on preemption mode
> > (PREEMPT_VOLUNTARY / PREEMPT_DESKTOP / PREEMPT_RT)

That doesn't parse for me... what does it say? The only one not listed
is the non-preempt option, that wouldn't reschedule no matter what
ZAP_BLOCK_SIZE.

2011-04-26 12:30:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Tue, 2011-04-26 at 09:25 +0200, Peter Zijlstra wrote:
> On Mon, 2011-04-25 at 16:29 -0400, Steven Rostedt wrote:
> > On Mon, 2011-04-25 at 21:47 +0200, Peter Zijlstra wrote:
> >
> > > Also, -rt doesn't care since it already has preemptible mmu_gather.
> >
> > To be fair, he did state:
> >
> > > In general, This is not a critical latency-path on preemption mode
> > > (PREEMPT_VOLUNTARY / PREEMPT_DESKTOP / PREEMPT_RT)
>
> That doesn't parse for me... what does it say? The only one not listed
> is the non-preempt option, that wouldn't reschedule no matter what
> ZAP_BLOCK_SIZE.

Heh, it didn't parse for me, as I took PREEMPT_DESKTOP and thought it
said PREEMPT_NONE.

Oh well.

-- Steve

2011-04-26 22:51:28

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Mon, Apr 25, 2011 at 5:42 PM, Geunsik Lim <[email protected]> wrote:
> I will modify contents that you commented.

Thank you for making the effort, but we would much prefer to eliminate
ZAP_BLOCK_SIZE than have a configurable ZAP_BLOCK_SIZE. So I don't
think you need update the contents here.

Peter's preemptible mmu_gather work is expected to be included in the
next mmotm. Please test on that, and if it does not meet your needs,
do let us know.

Thanks,
Hugh

2011-04-26 23:57:27

by Geunsik Lim

[permalink] [raw]
Subject: Re: [PATCH 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

On Tue, Apr 26, 2011 at 4:22 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2011-04-26 at 10:20 +0900, Geunsik Lim wrote:
>> Yes. I also checked the patch that you stated at LKML mailing list previously.
>> In my thinking. I want to keep ZAP_BLOCK_SIZE related contents
>> that adjusted by Ingo, Robert, Andrew, and so on a long time ago
>> because I believe that we can overcome below problems sufficiently
>> in real world.
>> . LKML archive - http://lkml.org/lkml/2002/7/24/273
>> . LKML archive - http://lkml.org/lkml/2004/9/14/101
>
> Real ancient world, that was 2004, well before we grew preemptible
> mmu_gather.
>
>> In my experience, I did overcome below problems with this patch
>> based on ZAP_BLOCK_SIZE.
>>
>> 1) To solve temporal CPU contention
>>     (e.g: case that cpu contention is 93% ~ 96% according to mmap/munmap
>>             to access mass files )
>> 2) To get real-time or real-fast selectively on specified linux system
>
> I still don't get it, what kernel are you targeting here and why?
In my case, I tested at embedded target(e.g: 2.6.29 , 2.6.32) based on
arm cortex-a series for user responsiveness when trying to access mass files.
>
> -RT doesn't care, and clearly PREEMPT=n doesn't care because its not
> about latency at all, the only half-way point is PREEMPT=y and for that
> you could simply reduce ZAP_BLOCK_SIZE.
Thank you for your reviews. yes. we can simply reduce ZAP_BLOCK_SIZE.
I mean that we can control ZAP_BLOCK_SIZE after consider a suitable
munmap() operation size both preemptive mode and non-preemptive mode.
>
> Then again, what's the point, simply remove the whole thing (like I did)
> and your problem is solved too.
If we can get real-fast or real-time with advanced preemptive mmu_gather
sufficiently according to user needs sometimes, I also think that
that's good certainly.
>
>
>



--
Regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
           [email protected] , [email protected]

2011-04-27 00:08:00

by Geunsik Lim

[permalink] [raw]
Subject: Re: [PATCH 3/4] munmap: kbuild menu for munmap interface

On Wed, Apr 27, 2011 at 7:51 AM, Hugh Dickins <[email protected]> wrote:
> On Mon, Apr 25, 2011 at 5:42 PM, Geunsik Lim <[email protected]> wrote:
>> I will modify contents that you commented.
>
> Thank you for making the effort, but we would much prefer to eliminate
> ZAP_BLOCK_SIZE than have a configurable ZAP_BLOCK_SIZE.  So I don't
> think you need update the contents here.
>
> Peter's preemptible mmu_gather work is expected to be included in the
> next mmotm.  Please test on that, and if it does not meet your needs,
> do let us know.
Thank you comments. As I answered to Peter, When we are running Linux kernel
at a various system (e.g: Server/Desktop/Embedded),
If preemptible mmu_gather work is sufficient to get real-fast or
real-time(= preemptive property?) selectively, I am ok.
>
> Thanks,
> Hugh
>



--
Regards,
Geunsik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
           [email protected] , [email protected]