Hello,
Token based thrashing control has a good effect in thrashing situation.
In current implementation, token timeout is a fixed value as SWAP_TOKEN_TIMEOUT. I think that administrators can improve thrashing behavior if the value can be changed.
This patch adds "swap_token_timeout" parameter in /proc/sys/vm.
The parameter means expired time of token. Unit of the value is HZ, and the default value is the same as current SWAP_TOKEN_TIMEOUT
(i.e. HZ * 300). The patch can be applied to both 2.6.9-rc2 and 2.6.9-rc3.
I tested the patch on an IA-32 4 way machine which has 4GB memory.
Based kernel was 2.6.9-rc1. I tested five swap_token_time_out values.
I created 256 workload generation processes on the machine. Each process
generated same workload: repeating random file writing and memory access
(Memory region is 18MB). I ran the processes for one hour and calculated
write throughput of workload processes. The result was following.
swap_token_time_out Write throughput [MB/s]
------------------- ----------------------
30,000,000 6.71
3,000,000 8.26
300,000 (DEFAULT) 7.90
30,000 8.16
3,000 7.43
As you can see, it may be possible to improve application performance
according to tune swap_token_time_out. Additionally, I think it is better to decrease default value. One reason is that the other values gained good performance. The other is that behavior of kernel may be unstable if swap_token_time_out is too long.
I am exploring tuning policy.
Any comments or suggestions?
Best regards,
Hideo AOKI
Systems Development Laboratory, Hitachi, Ltd.
----
Signed-off-by: Hideo Aoki <[email protected]>
diff -uprN linux-2.6.9-rc2/include/linux/swap.h linux-2.6.9-rc2-tbtc_tune/include/linux/swap.h
--- linux-2.6.9-rc2/include/linux/swap.h 2004-09-15 12:20:40.000000000 +0900
+++ linux-2.6.9-rc2-tbtc_tune/include/linux/swap.h 2004-09-28 15:12:33.000000000 +0900
@@ -230,6 +230,7 @@ extern spinlock_t swaplock;
/* linux/mm/thrash.c */
extern struct mm_struct * swap_token_mm;
+extern unsigned long swap_token_default_timeout;
extern void grab_swap_token(void);
extern void __put_swap_token(struct mm_struct *);
diff -uprN linux-2.6.9-rc2/include/linux/sysctl.h linux-2.6.9-rc2-tbtc_tune/include/linux/sysctl.h
--- linux-2.6.9-rc2/include/linux/sysctl.h 2004-09-15 12:20:40.000000000 +0900
+++ linux-2.6.9-rc2-tbtc_tune/include/linux/sysctl.h 2004-09-28 15:12:33.000000000 +0900
@@ -167,6 +167,7 @@ enum
VM_HUGETLB_GROUP=25, /* permitted hugetlb group */
VM_VFS_CACHE_PRESSURE=26, /* dcache/icache reclaim pressure */
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space layout */
+ VM_SWAP_TOKEN_TIMEOUT=28 /* default time for token time out */
};
diff -uprN linux-2.6.9-rc2/kernel/sysctl.c linux-2.6.9-rc2-tbtc_tune/kernel/sysctl.c
--- linux-2.6.9-rc2/kernel/sysctl.c 2004-09-15 12:20:41.000000000 +0900
+++ linux-2.6.9-rc2-tbtc_tune/kernel/sysctl.c 2004-09-28 15:12:33.000000000 +0900
@@ -800,6 +800,15 @@ static ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
+ {
+ .ctl_name = VM_SWAP_TOKEN_TIMEOUT,
+ .procname = "swap_token_timeout",
+ .data = &swap_token_default_timeout,
+ .maxlen = sizeof(swap_token_default_timeout),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ .strategy = &sysctl_intvec,
+ },
{ .ctl_name = 0 }
};
diff -uprN linux-2.6.9-rc2/mm/thrash.c linux-2.6.9-rc2-tbtc_tune/mm/thrash.c
--- linux-2.6.9-rc2/mm/thrash.c 2004-09-15 12:20:42.000000000 +0900
+++ linux-2.6.9-rc2-tbtc_tune/mm/thrash.c 2004-09-28 15:12:33.000000000 +0900
@@ -20,6 +20,8 @@ struct mm_struct * swap_token_mm = &init
#define SWAP_TOKEN_CHECK_INTERVAL (HZ * 2)
#define SWAP_TOKEN_TIMEOUT (HZ * 300)
+unsigned long swap_token_default_timeout = SWAP_TOKEN_TIMEOUT;
+
/*
* Take the token away if the process had no page faults
@@ -75,10 +77,10 @@ void grab_swap_token(void)
if ((reason = should_release_swap_token(mm))) {
unsigned long eligible = jiffies;
if (reason == SWAP_TOKEN_TIMED_OUT) {
- eligible += SWAP_TOKEN_TIMEOUT;
+ eligible += swap_token_default_timeout;
}
mm->swap_token_time = eligible;
- swap_token_timeout = jiffies + SWAP_TOKEN_TIMEOUT;
+ swap_token_timeout = jiffies + swap_token_default_timeout;
swap_token_mm = current->mm;
}
spin_unlock(&swap_token_lock);
On Tue, 5 Oct 2004, Hideo AOKI wrote:
> I am exploring tuning policy.
>
> Any comments or suggestions?
While I believe that a self tuning timeout might be better in
the long run, this tunable will certainly help tune policy.
Besides, even with a self tuning timeout we'll want to have
this tunable visible in /proc so we can debug things by seeing
what value the kernel set the tunable to ;)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
Hideo AOKI <[email protected]> writes:
> This patch adds "swap_token_timeout" parameter in /proc/sys/vm.
> The parameter means expired time of token. Unit of the value is HZ, and the default value is the same as current SWAP_TOKEN_TIMEOUT
> (i.e. HZ * 300). The patch can be applied to both 2.6.9-rc2 and 2.6.9-rc3.
Please don't export any sysctls as jiffies. The values of jiffies changes.
Use s or ms instead. sysctl has convenience functions for this.
-Andi
Hideo AOKI <[email protected]> wrote:
>
> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ja-JP; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
uh-oh.
> This patch adds "swap_token_timeout" parameter in /proc/sys/vm.
The patch is hopelessly whitespace-mangled. Please use attachments.
include/linux/swap.h | 1 +
include/linux/sysctl.h | 1 +
kernel/sysctl.c | 9 +++++++++
mm/thrash.c | 5 +++--
4 files changed, 14 insertions(+), 2 deletions(-)
Signed-off-by: Hideo Aoki <[email protected]>
diff -uprN linux-2.6.9-rc3/include/linux/swap.h linux-2.6.9-rc3-vm-thrashing-control-tuning/include/linux/swap.h
--- linux-2.6.9-rc3/include/linux/swap.h 2004-09-30 15:01:04.000000000 +0900
+++ linux-2.6.9-rc3-vm-thrashing-control-tuning/include/linux/swap.h 2004-10-04 13:45:11.000000000 +0900
@@ -230,6 +230,7 @@ extern spinlock_t swaplock;
/* linux/mm/thrash.c */
extern struct mm_struct * swap_token_mm;
+extern unsigned long swap_token_default_timeout;
extern void grab_swap_token(void);
extern void __put_swap_token(struct mm_struct *);
diff -uprN linux-2.6.9-rc3/include/linux/sysctl.h linux-2.6.9-rc3-vm-thrashing-control-tuning/include/linux/sysctl.h
--- linux-2.6.9-rc3/include/linux/sysctl.h 2004-09-30 15:01:04.000000000 +0900
+++ linux-2.6.9-rc3-vm-thrashing-control-tuning/include/linux/sysctl.h 2004-10-04 13:45:11.000000000 +0900
@@ -167,6 +167,7 @@ enum
VM_HUGETLB_GROUP=25, /* permitted hugetlb group */
VM_VFS_CACHE_PRESSURE=26, /* dcache/icache reclaim pressure */
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space layout */
+ VM_SWAP_TOKEN_TIMEOUT=28 /* default time for token time out */
};
diff -uprN linux-2.6.9-rc3/kernel/sysctl.c linux-2.6.9-rc3-vm-thrashing-control-tuning/kernel/sysctl.c
--- linux-2.6.9-rc3/kernel/sysctl.c 2004-09-30 15:01:05.000000000 +0900
+++ linux-2.6.9-rc3-vm-thrashing-control-tuning/kernel/sysctl.c 2004-10-06 17:39:48.000000000 +0900
@@ -800,6 +800,15 @@ static ctl_table vm_table[] = {
.extra1 = &zero,
},
#endif
+ {
+ .ctl_name = VM_SWAP_TOKEN_TIMEOUT,
+ .procname = "swap_token_timeout",
+ .data = &swap_token_default_timeout,
+ .maxlen = sizeof(swap_token_default_timeout),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_jiffies,
+ .strategy = &sysctl_jiffies,
+ },
{ .ctl_name = 0 }
};
diff -uprN linux-2.6.9-rc3/mm/thrash.c linux-2.6.9-rc3-vm-thrashing-control-tuning/mm/thrash.c
--- linux-2.6.9-rc3/mm/thrash.c 2004-09-30 15:01:06.000000000 +0900
+++ linux-2.6.9-rc3-vm-thrashing-control-tuning/mm/thrash.c 2004-10-06 18:53:10.000000000 +0900
@@ -20,6 +20,7 @@ struct mm_struct * swap_token_mm = &init
#define SWAP_TOKEN_CHECK_INTERVAL (HZ * 2)
#define SWAP_TOKEN_TIMEOUT (HZ * 300)
+unsigned long swap_token_default_timeout = SWAP_TOKEN_TIMEOUT;
/*
* Take the token away if the process had no page faults
@@ -75,10 +76,10 @@ void grab_swap_token(void)
if ((reason = should_release_swap_token(mm))) {
unsigned long eligible = jiffies;
if (reason == SWAP_TOKEN_TIMED_OUT) {
- eligible += SWAP_TOKEN_TIMEOUT;
+ eligible += swap_token_default_timeout;
}
mm->swap_token_time = eligible;
- swap_token_timeout = jiffies + SWAP_TOKEN_TIMEOUT;
+ swap_token_timeout = jiffies + swap_token_default_timeout;
swap_token_mm = current->mm;
}
spin_unlock(&swap_token_lock);
Rik van Riel wrote:
> While I believe that a self tuning timeout might be better in
> the long run, this tunable will certainly help tune policy.
Thank you for your comments.
I agree with you. The best solution is a self tuning timeout.
Using this swap_token_timeout parameter, I would like to get
a clue to self tuning.
Best regards,
Hideo AOKI
Systems Development Laboratory, Hitachi, Ltd.
Hideo AOKI <[email protected]> wrote:
>
> [vm-thrashing-control-tuning.patch text/plain (3256 bytes)]
Please send an additional patch to update Documentation/filesystems/proc.txt
and Documentation/sysctl/vm.txt
filesystems/proc.txt | 8 ++++++++
sysctl/vm.txt | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
Signed-off-by: Hideo Aoki <[email protected]>
diff -uprN linux-2.6.9-rc3-vm-thrashing-control-tuning/Documentation/filesystems/proc.txt linux-2.6.9-rc3-vm-tuning-doc/Documentation/filesystems/proc.txt
--- linux-2.6.9-rc3-vm-thrashing-control-tuning/Documentation/filesystems/proc.txt 2004-10-07 10:47:23.000000000 +0900
+++ linux-2.6.9-rc3-vm-tuning-doc/Documentation/filesystems/proc.txt 2004-10-07 21:31:37.316226768 +0900
@@ -1269,6 +1269,14 @@ block_dump
block_dump enables block I/O debugging when set to a nonzero value. More
information on block I/O debugging is in Documentation/laptop-mode.txt.
+swap_token_timeout
+------------------
+
+This file contains valid hold time of swap out protection token. The Linux
+VM has token based thrashing control mechanism and uses the token to prevent
+unnecessary page faults in thrashing situation. The unit of the value is
+second. The value would be useful to tune thrashing behavior.
+
2.5 /proc/sys/dev - Device specific parameters
----------------------------------------------
diff -uprN linux-2.6.9-rc3-vm-thrashing-control-tuning/Documentation/sysctl/vm.txt linux-2.6.9-rc3-vm-tuning-doc/Documentation/sysctl/vm.txt
--- linux-2.6.9-rc3-vm-thrashing-control-tuning/Documentation/sysctl/vm.txt 2004-10-07 10:47:53.000000000 +0900
+++ linux-2.6.9-rc3-vm-tuning-doc/Documentation/sysctl/vm.txt 2004-10-07 16:48:56.969591368 +0900
@@ -31,7 +31,7 @@ Currently, these files are in /proc/sys/
dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump:
+block_dump, swap_token_timeout:
See Documentation/filesystems/proc.txt
proc.txt | 31 ++++++++++++++++---------------
1 files changed, 16 insertions(+), 15 deletions(-)
Signed-off-by: Hideo Aoki <[email protected]>
diff -uprN linux-2.6.9-rc3-vm-tuning-doc/Documentation/filesystems/proc.txt linux-2.6.9-rc3-doc-fs-proc-fix/Documentation/filesystems/proc.txt
--- linux-2.6.9-rc3-vm-tuning-doc/Documentation/filesystems/proc.txt 2004-10-07 21:31:37.000000000 +0900
+++ linux-2.6.9-rc3-doc-fs-proc-fix/Documentation/filesystems/proc.txt 2004-10-07 22:05:46.847650632 +0900
@@ -350,22 +350,6 @@ available. In this case, there are 0 ch
ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
available in ZONE_NORMAL, etc...
-
-1.3 IDE devices in /proc/ide
-----------------------------
-
-The subdirectory /proc/ide contains information about all IDE devices of which
-the kernel is aware. There is one subdirectory for each IDE controller, the
-file drivers and a link for each IDE device, pointing to the device directory
-in the controller specific subtree.
-
-The file drivers contains general information about the drivers used for the
-IDE devices:
-
- > cat /proc/ide/drivers
- ide-cdrom version 4.53
- ide-disk version 1.08
-
..............................................................................
meminfo:
@@ -454,6 +438,22 @@ VmallocTotal: total size of vmalloc memo
VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contigious block of vmalloc area which is free
+
+1.3 IDE devices in /proc/ide
+----------------------------
+
+The subdirectory /proc/ide contains information about all IDE devices of which
+the kernel is aware. There is one subdirectory for each IDE controller, the
+file drivers and a link for each IDE device, pointing to the device directory
+in the controller specific subtree.
+
+The file drivers contains general information about the drivers used for the
+IDE devices:
+
+ > cat /proc/ide/drivers
+ ide-cdrom version 4.53
+ ide-disk version 1.08
+
More detailed information can be found in the controller specific
subdirectories. These are named ide0, ide1 and so on. Each of these
directories contains the files shown in table 1-4.
Hi,
In include/linux/sysctl.h of 2.6.9-rc3-mm3, both VM_HEAP_STACK_GAP and
VM_SWAP_TOKEN_TIMEOUT are 28. Maybe one of them should be 29?
--- include/linux/sysctl.h (revision 11)
+++ include/linux/sysctl.h (local)
@@ -168,7 +168,7 @@
VM_VFS_CACHE_PRESSURE=26, /* dcache/icache reclaim pressure */
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space
layout */
VM_HEAP_STACK_GAP=28, /* int: page gap between heap and stack */
- VM_SWAP_TOKEN_TIMEOUT=28, /* default time for token time out */
+ VM_SWAP_TOKEN_TIMEOUT=29, /* default time for token time out */
};
--
Best Regards,
Wen-chien Jesse Sung
On Mon, Oct 11, 2004 at 04:39:58PM +0800, Wen-chien Jesse Sung wrote:
> Hi,
>
> In include/linux/sysctl.h of 2.6.9-rc3-mm3, both VM_HEAP_STACK_GAP and
> VM_SWAP_TOKEN_TIMEOUT are 28. Maybe one of them should be 29?
It doesn't really matter much anymore because the numerical sysctl
values are deprecated. Everybody should use the names in
/proc/sys instead. Hopefully they will eventually go away completely
and avoid a lot of patch conflicts.
-Andi