2011-06-27 13:42:22

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH V2 0/4] munmap: Flexible mem unmap operation interface for scheduling latency

From: Geunsik Lim <[email protected]>

[Summary]
These are modified patch files from the initial version based on the reviews. I would
like to thank Peter, Steven, Randy, and Hugh for their valuable reviews and comments.
(Refer to : https://lkml.org/lkml/2011/4/25/55)

Now, I am uploading [PATCH V2] based on Linux 2.6.39 for embedded developers who asked
me for it after the first version. In my case, this patch worked normally in Linux 2.6.32
up to Linux 2.6.39 without any problems.

If you are using the latest Linux version, refer to Peter's preemptible mmu_gather work
to eliminate ZAP_BLOCK_SIZE than have a configurable ZAP_BLOCK_SIZE.
(Refer to : https://lkml.org/lkml/2011/4/1/141)

[Details]
As we all know, the specification of H/W(cpu, memory, i/o bandwidth, etc) is
different according to their SOC. We can earn a suitable performance(or latency) after
adjust memory unmap size by selecting an optimal value to consider specified system
environment in real world.
In other words, We can get real-fast or real-time using the Linux kernel tunable
parameter choosingly for flexible memory unmap operation unit.

For example, we can get the below effectiveness using this patch
. Reduce a temporal cpu intention(highest cpu usage) when accessing mass files
. Improvement of user responsiveness at embedded products like mobile phone, camcorder, dica
. Get a effective real-time or real-fast at the real world that depend on the physical H/W
. Support sysctl interface(tunalbe parameter) to find a suitable munmap operation unit
at runtime favoringly

unmap_vmas(= unmap a range of memory covered by a list of vma) is treading
a delicate and uncomfortable line between high performance and lo-latency.
We have often chosen to improve performance at the expense of latency.

So although there may be no need to reschedule right now,
if we keep on gathering more and more without flushing,
we'll be very unresponsive when a resched is needed later on.

resched is a routine that is called by the current process when rescheduling is to
take place. It is called not only when the time quantum of the current process expires
but also when a blocking(waiting) call such as wait is invoked by the current process
or when a new process of potentially higher priority becomes eligible for execution.

Here are some history about ZAP_BLOCK_SIZE content discussed for scheduling latencies
a long time ago. Hence Ingo's ZAP_BLOCK_SIZE to split it up, small when CONFIG_PREEMPT,
more reasonable but still limited when not.
. Patch subject - [patch] sched, mm: fix scheduling latencies in unmap_vmas()
. LKML archive - http://lkml.org/lkml/2004/9/14/101

Robert Love submitted to get the better latencies by creating a preemption point
at Linux 2.5.28 (development version).
. Patch subject - [PATCH] updated low-latency zap_page_range
. LKML archive - http://lkml.org/lkml/2002/7/24/273

Originally, We aim to not hold locks for too long (for scheduling latency reasons).
So zap pages in ZAP_BLOCK_SIZE byte counts.
This means we need to return the ending mmu_gather to the caller.

In general, This is not a critical latency-path on preemptive mode
(PREEMPT_VOLUNTARY / PREEMPT_DESKTOP / PREEMPT_RT)

. Vanilla's preemptive mode (mainline kernel tree)
- http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git v2.6.38
1) CONFIG_PREEMPT_NONE: No Forced Preemption (Server)
2) CONFIG_PREEMPT_VOLUNTARY: Voluntary Kernel Preemption (Desktop)
3) CONFIG_PREEMPT: Preemptible Kernel (Low-Latency Desktop)

. Ingo rt patch's preemptive mode (-tip kernel tree)
- http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git v2.6.33.9-rt31
1) CONFIG_PREEMPT_NONE
2) CONFIG_PREEMPT_VOLUNTARY
3) CONFIG_PREEMPT + CONFIG_PREEMPT_DESKTOP
4) CONFIG_PREEMPT + CONFIG_PREEMPT_RT + CONFIG_PREEMPT_{SOFTIRQS|HARDIRQS}

This value can be changed at runtime using the
'/proc/sys/vm/munmap_unit_size' as Linux kernel tunable parameter after boot.

* Examples: The size of one page is 4,096bytes.
2048 => 8,388,608bytes : for straight-line efficiency (performance)
1024 => 4,194,304 bytes
512 => 2,097,152 bytes
256 => 1,048,576 bytes
128 => 524,288 bytes
64 => 262,144 bytes
32 => 131,072 bytes
16 => 65,536 bytes
8 => 32,768 bytes : for low latency

p.s: I verified parsing of this patch file with './linux-2.6/script/checkpatch.pl' script.
and, I uploaded demo video using Youtube about the evaluation result according
to munmap operation unit interface. (http://www.youtube.com/watch?v=PxcgvDTY5F0)

Thanks for reading.

Geunsik Lim (4):
munmap operation size handling
sysctl extension for tunable parameter
kbuild menu for munmap interface
documentation of munmap operation interface

Documentation/sysctl/vm.txt | 36 +++++++++++++++++++
MAINTAINERS | 7 ++++
include/linux/munmap_unit_size.h | 24 +++++++++++++
init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 10 +++++
mm/Makefile | 4 ++-
mm/memory.c | 21 +++++++----
mm/munmap_unit_size.c | 57 +++++++++++++++++++++++++++++++
8 files changed, 221 insertions(+), 8 deletions(-)
create mode 100644 include/linux/munmap_unit_size.h
create mode 100644 mm/munmap_unit_size.c

--
1.7.3.4


2011-06-27 13:42:35

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH V2 1/4] munmap: mem unmap operation size handling

From: Geunsik Lim <[email protected]>

The specification of H/W(CPU, memory, I/O bandwidth, etc) is different
according to their SOC. We can earn a suitable performance(or latency)
after adjust memory unmap size by selecting an optimal value to consider
specified system environment in real world.

In other words, We can get real-fast or real-time using the Linux kernel
tunable parameter choosingly for flexible memory unmap operation unit.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Hugh Dickins <[email protected]>
CC: Randy Dunlap <[email protected]>
CC: Ingo Molnar <[email protected]>
---
include/linux/munmap_unit_size.h | 24 ++++++++++++++++
mm/Makefile | 4 ++-
mm/memory.c | 21 +++++++++++-----
mm/munmap_unit_size.c | 57 ++++++++++++++++++++++++++++++++++++++

4 files changed, 84 insertions(+), 1 deletions(-)
create mode 100644 include/linux/munmap_unit_size.h
create mode 100644 mm/munmap_unit_size.c

diff --git a/include/linux/munmap_unit_size.h b/include/linux/munmap_unit_size.h
new file mode 100644
index 0000000..c4f1fd4
--- /dev/null
+++ b/include/linux/munmap_unit_size.h
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Due to this file being licensed under the GPL there is controversy over
+ * whether this permits you to write a module that #includes this file
+ * without placing your module under the GPL. Please consult a lawyer for
+ * advice before doing this.
+ *
+ */
+
+#ifdef CONFIG_MMU
+extern unsigned long munmap_unit_size;
+extern unsigned long sysctl_munmap_unit_size;
+#else
+#define sysctl_munmap_unit_size 0UL
+#endif
+
+#ifdef CONFIG_MMU
+extern int munmap_unit_size_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+#endif


diff --git a/mm/Makefile b/mm/Makefile
index 42a8326..4b55b6c 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -5,7 +5,9 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o pagewalk.o pgtable-generic.o
+ vmalloc.o pagewalk.o pgtable-generic.o \
+ munmap_unit_size.o
+

obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page_alloc.o page-writeback.o \
diff --git a/mm/memory.c b/mm/memory.c
index ce22a25..8573cb6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -57,6 +57,7 @@
#include <linux/swapops.h>
#include <linux/elf.h>
#include <linux/gfp.h>
+#include <linux/munmap_unit_size.h>

#include <asm/io.h>
#include <asm/pgalloc.h>
@@ -1088,12 +1089,10 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
return addr;
}

-#ifdef CONFIG_PREEMPT
-# define ZAP_BLOCK_SIZE (8 * PAGE_SIZE)
-#else
-/* No preempt: go for improved straight-line efficiency */
-# define ZAP_BLOCK_SIZE (1024 * PAGE_SIZE)
-#endif
+/* No preempt: go for improved straight-line efficiency
+ * on PREEMPT(preemptive mode) this is not a critical latency-path.
+ */
+# define ZAP_BLOCK_SIZE (munmap_unit_size * PAGE_SIZE)

/**
* unmap_vmas - unmap a range of memory covered by a list of vma's
diff --git a/mm/munmap_unit_size.c b/mm/munmap_unit_size.c
new file mode 100644
index 0000000..1a2a2c6
--- /dev/null
+++ b/mm/munmap_unit_size.c
@@ -0,0 +1,57 @@
+/*
+ * Memory Unmap Operation Unit Interface
+ * (C) Geunsik Lim, April 2011
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/munmap_unit_size.h>
+#include <linux/sysctl.h>
+
+/* amount of vm to unmap from userspace access by both non-preemptive mode
+ * and preemptive mode
+ */
+unsigned long munmap_unit_size;
+
+/*
+ * Memory unmap operation unit of vm to release allocated memory size from
+ * userspace using mmap system call
+ */
+#if !defined(CONFIG_PREEMPT_VOLUNTARY) && !defined(CONFIG_PREEMPT)
+unsigned long sysctl_munmap_unit_size = CONFIG_PREEMPT_NO_MUNMAP_RANGE;
+#else
+unsigned long sysctl_munmap_unit_size = CONFIG_PREEMPT_OK_MUNMAP_RANGE;
+#endif
+
+/*
+ * Update munmap_unit_size that changed with /proc/sys/vm/munmap_unit_size
+ * tunable value.
+ */
+static void update_munmap_unit_size(void)
+{
+ munmap_unit_size = sysctl_munmap_unit_size;
+}
+
+/*
+ * sysctl handler which just sets sysctl_munmap_unit_size = the new value
+ * and then calls update_munmap_unit_size()
+ */
+int munmap_unit_size_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
+
+ update_munmap_unit_size();
+
+ return ret;
+}
+
+static int __init init_munmap_unit_size(void)
+{
+ update_munmap_unit_size();
+
+ return 0;
+}
+pure_initcall(init_munmap_unit_size);
--
1.7.3.4

2011-06-27 13:42:46

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH V2 2/4] munmap: sysctl extension for tunable parameter

From: Geunsik Lim <[email protected]>

Support sysctl interface(tunalbe parameter) to find a suitable munmap
operation unit at runtime favoringly

* sysctl: An interface for examining and dynamically changing munmap opearon
size parameters in Linux. In Linux, the sysctl is implemented as
a wrapper around file system routines that access contents of files
in the /proc

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Hugh Dickins <[email protected]>
CC: Randy Dunlap <[email protected]>
CC: Ingo Molnar <[email protected]>
---
kernel/sysctl.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c0bb324..9b85041 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -56,6 +56,7 @@
#include <linux/kprobes.h>
#include <linux/pipe_fs_i.h>
#include <linux/oom.h>
+#include <linux/munmap_unit_size.h>

#include <asm/uaccess.h>
#include <asm/processor.h>
@@ -1278,6 +1279,15 @@ static struct ctl_table vm_table[] = {
.proc_handler = mmap_min_addr_handler,
},
#endif
+#ifdef CONFIG_MMU
+ {
+ .procname = "munmap_unit_size",
+ .data = &sysctl_munmap_unit_size,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = munmap_unit_size_handler,
+ },
+#endif
#ifdef CONFIG_NUMA
{
.procname = "numa_zonelist_order",
--
1.7.3.4

2011-06-27 13:42:56

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH V2 3/4] munmap: kbuild menu for munmap interface

From: Geunsik Lim <[email protected]>

Support kbuild menu to select memory unmap operation size
at build time.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Hugh Dickins <[email protected]>
CC: Randy Dunlap <[email protected]>
CC: Ingo Molnar <[email protected]>
---
init/Kconfig | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 84 insertions(+), 7 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..47283ed 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -557,6 +557,79 @@ config LOG_BUF_SHIFT
13 => 8 KB
12 => 4 KB

+config PREEMPT_OK_MUNMAP_RANGE
+ int "Memory unmap unit on preemptive mode (8 => 32KB)"
+ depends on !PREEMPT_NONE
+ range 8 2048
+ default 8
+ help
+ unmap_vmas(= unmap a range of memory covered by a list of vma) is
+ treading a delicate and uncomfortable line between high performance
+ and low latency.
+ This option improves performance at the expense of latency.
+
+ So although there may be no need to reschedule right now,
+ if we keep on gathering more and more memory without flushing,
+ we'll be very unresponsive when a reschedule is needed later on.
+
+ Consider the best suitable result between high performance and
+ low latency on preempt mode. Select optimal munmap size to return
+ memory space that is allocated by mmap system call.
+
+ For example, for recording mass files, if we try to unmap memory
+ that we allocated with 100MB for recording in embedded devices,
+ we have to wait for more than 3 seconds to change mode from play
+ mode to recording mode. This results from the unit of memory unmapped
+ size when we are recording mass files like camcorder particularly.
+
+ This value can be changed after boot using the
+ /proc/sys/vm/munmap_unit_size tunable.
+
+ Examples:
+ 2048 => 8,388,608 bytes : for straight-line efficiency
+ 1024 => 4,194,304 bytes
+ 512 => 2,097,152 bytes
+ 256 => 1,048,576 bytes
+ 128 => 524,288 bytes
+ 64 => 262,144 bytes
+ 32 => 131,072 bytes
+ 16 => 65,536 bytes
+ 8 => 32,768 bytes : for low-latency (*default)
+
+config PREEMPT_NO_MUNMAP_RANGE
+ int "Memory unmap unit on non-preempt mode (1024 => 4MB)"
+ depends on PREEMPT_NONE
+ range 8 2048
+ default 1024
+ help
+
+ unmap_vmas(= unmap a range of memory covered by a list of vma) is
+ treading a delicate and uncomfortable line between high performance
+ and low latency.
+ This option improves performance at the expense of latency.
+
+ So although there may be no need to reschedule right now,
+ if we keep on gathering more and more memory without flushing,
+ we'll be very unresponsive when a reschedule is needed later on.
+
+ Consider the best suitable result between high performance and
+ low latency on non-preempt mode. Select optimal munmap size to return
+ memory space that is allocated by mmap system call.
+
+ This value can be changed after boot using the
+ /proc/sys/vm/munmap_unit_size tunable.
+
+ Examples:
+ 2048 => 8,388,608 bytes : for straight-line efficiency
+ 1024 => 4,194,304 bytes (*default)
+ 512 => 2,097,152 bytes
+ 256 => 1,048,576 bytes
+ 128 => 524,288 bytes
+ 64 => 262,144 bytes
+ 32 => 131,072 bytes
+ 16 => 65,536 bytes
+ 8 => 32,768 bytes : for low-latency
+
#
# Architectures with an unreliable sched_clock() should select this:
#
--
1.7.3.4

2011-06-27 13:43:04

by Geunsik Lim

[permalink] [raw]
Subject: [PATCH V2 4/4] munmap: documentation of munmap operation interface

From: Geunsik Lim <[email protected]>

kernel documentation to utilize flexible memory unmap operation
interface for the ideal scheduler latency.

Signed-off-by: Geunsik Lim <[email protected]>
Acked-by: Hyunjin Choi <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Hugh Dickins <[email protected]>
CC: Randy Dunlap <[email protected]>
CC: Ingo Molnar <[email protected]>
---
Documentation/sysctl/vm.txt | 36 ++++++++++++++++++++++++++++++++++++
MAINTAINERS | 7 +++++++
2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 30289fa..5d70098 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -40,6 +40,7 @@ Currently, these files are in /proc/sys/vm:
- min_slab_ratio
- min_unmapped_ratio
- mmap_min_addr
+- munmap_unit_size
- nr_hugepages
- nr_overcommit_hugepages
- nr_pdflush_threads
@@ -409,6 +410,42 @@ against future potential kernel bugs.

==============================================================

+munmap_unit_size
+
+unmap_vmas(= unmap a range of memory covered by a list of vma) is treading
+a delicate and uncomfortable line between high performance and low latency.
+We've chosen to improve performance at the expense of latency.
+
+So although there may be no need to reschedule right now,
+if we keep on gathering more and more memory without flushing,
+we'll be very unresponsive when a reschedule is needed later on.
+
+Consider the best suitable result between high performance and low latency
+on preemptive mode or non-preemptive mode. Select optimal munmap size to
+return memory space that is allocated by mmap system call.
+
+For example, for recording mass files, if we try to unmap memory that we
+allocated with 100MB for recording in embedded devices, we have to wait
+for more than 3 seconds to change mode from play mode to recording mode.
+This results from the unit of memory unmapped size when we are recording
+mass files like camcorder particularly.
+
+This value can be changed after boot using the
+/proc/sys/vm/munmap_unit_size tunable.
+
+Examples:
+ 2048 => 8,388,608 bytes : for straight-line efficiency
+ 1024 => 4,194,304 bytes
+ 512 => 2,097,152 bytes
+ 256 => 1,048,576 bytes
+ 128 => 524,288 bytes
+ 64 => 262,144 bytes
+ 32 => 131,072 bytes
+ 16 => 65,536 bytes
+ 8 => 32,768 bytes : for low-latency
+
+==============================================================
+
nr_hugepages

Change the minimum size of the hugepage pool.
diff --git a/MAINTAINERS b/MAINTAINERS
index 1380312..3f1960a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4128,6 +4128,12 @@ L: [email protected]
S: Maintained
F: mm/memcontrol.c

+MEMORY UNMAP OPERATION UNIT INTERFACE
+M: Geunsik Lim <[email protected]>
+S: Maintained
+F: mm/munmap_unit_size.c
+F: include/linux/munmap_unit_size.h
+
MEMORY TECHNOLOGY DEVICES (MTD)
M: David Woodhouse <[email protected]>
L: [email protected]
--
1.7.3.4