2006-02-27 07:02:30

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 01/02] cpuset memory spread slab cache filesys

From: Paul Jackson <[email protected]>

Mark file system inode and similar slab caches subject to
SLAB_MEM_SPREAD memory spreading.

If a slab cache is marked SLAB_MEM_SPREAD, then anytime that
a task that's in a cpuset with the 'memory_spread_slab' option
enabled goes to allocate from such a slab cache, the allocations
are spread evenly over all the memory nodes (task->mems_allowed)
allowed to that task, instead of favoring allocation on the
node local to the current cpu.

The following inode and similar caches are marked SLAB_MEM_SPREAD:

file cache
==== =====
fs/adfs/super.c adfs_inode_cache
fs/affs/super.c affs_inode_cache
fs/befs/linuxvfs.c befs_inode_cache
fs/bfs/inode.c bfs_inode_cache
fs/block_dev.c bdev_cache
fs/cifs/cifsfs.c cifs_inode_cache
fs/coda/inode.c coda_inode_cache
fs/dquot.c dquot
fs/efs/super.c efs_inode_cache
fs/ext2/super.c ext2_inode_cache
fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
fs/ext3/super.c ext3_inode_cache
fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
fs/fat/cache.c fat_cache
fs/fat/inode.c fat_inode_cache
fs/freevxfs/vxfs_super.c vxfs_inode
fs/hpfs/super.c hpfs_inode_cache
fs/isofs/inode.c isofs_inode_cache
fs/jffs/inode-v23.c jffs_fm
fs/jffs2/super.c jffs2_i
fs/jfs/super.c jfs_ip
fs/minix/inode.c minix_inode_cache
fs/ncpfs/inode.c ncp_inode_cache
fs/nfs/direct.c nfs_direct_cache
fs/nfs/inode.c nfs_inode_cache
fs/ntfs/super.c ntfs_big_inode_cache_name
fs/ntfs/super.c ntfs_inode_cache
fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
fs/ocfs2/super.c ocfs2_inode_cache
fs/proc/inode.c proc_inode_cache
fs/qnx4/inode.c qnx4_inode_cache
fs/reiserfs/super.c reiser_inode_cache
fs/romfs/inode.c romfs_inode_cache
fs/smbfs/inode.c smb_inode_cache
fs/sysv/inode.c sysv_inode_cache
fs/udf/super.c udf_inode_cache
fs/ufs/super.c ufs_inode_cache
net/socket.c sock_inode_cache
net/sunrpc/rpc_pipe.c rpc_inode_cache

The choice of which slab caches to so mark was quite simple. I
marked those already marked SLAB_RECLAIM_ACCOUNT, except for
fs/xfs, dentry_cache, inode_cache, and buffer_head, which were
marked in a previous patch. Even though SLAB_RECLAIM_ACCOUNT is
for a different purpose, it marks the same potentially large file
system i/o related slab caches as we need for memory spreading.

Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode
cache), use the SLAB_MEM_SPREAD flag too", this should be easy
enough to maintain. Future file system writers will just copy
one of the existing file system slab cache setups and tend to
get it right without thinking.

==> This patch violates line length constraints, pushing
many lines past 80 columns. The next patch wraps
text to fit within 80 columns again. I split these
two so that it would be easy to see the changes caused
by adding the SLAB_MEM_SPREAD option separately from
the formatting changes caused by rewrapping source lines.

Signed-off-by: Paul Jackson <[email protected]>

---

Andrew - these two cpuset-memory-spread-slab-cache-hooks-filesys
and *-spacing patches should fit nicely in your *-mm stack,
right after cpuset-memory-spread-slab-cache-hooks.patch. -pj


fs/adfs/super.c | 2 +-
fs/affs/super.c | 2 +-
fs/befs/linuxvfs.c | 2 +-
fs/bfs/inode.c | 2 +-
fs/block_dev.c | 2 +-
fs/cifs/cifsfs.c | 2 +-
fs/coda/inode.c | 2 +-
fs/dquot.c | 2 +-
fs/efs/super.c | 2 +-
fs/ext2/super.c | 2 +-
fs/ext3/super.c | 2 +-
fs/fat/cache.c | 2 +-
fs/fat/inode.c | 2 +-
fs/freevxfs/vxfs_super.c | 2 +-
fs/hpfs/super.c | 2 +-
fs/isofs/inode.c | 2 +-
fs/jffs/inode-v23.c | 4 ++--
fs/jffs2/super.c | 2 +-
fs/jfs/super.c | 2 +-
fs/mbcache.c | 2 +-
fs/minix/inode.c | 2 +-
fs/ncpfs/inode.c | 2 +-
fs/nfs/direct.c | 2 +-
fs/nfs/inode.c | 2 +-
fs/ntfs/super.c | 4 ++--
fs/ocfs2/dlm/dlmfs.c | 2 +-
fs/ocfs2/super.c | 2 +-
fs/proc/inode.c | 2 +-
fs/qnx4/inode.c | 2 +-
fs/reiserfs/super.c | 2 +-
fs/romfs/inode.c | 2 +-
fs/smbfs/inode.c | 2 +-
fs/sysv/inode.c | 2 +-
fs/udf/super.c | 2 +-
fs/ufs/super.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/rpc_pipe.c | 2 +-
37 files changed, 39 insertions(+), 39 deletions(-)

--- 2.6.16-rc4-mm2.orig/fs/adfs/super.c 2006-02-26 17:53:50.424915062 -0800
+++ 2.6.16-rc4-mm2/fs/adfs/super.c 2006-02-26 18:40:51.420223812 -0800
@@ -241,7 +241,7 @@ static int init_inodecache(void)
{
adfs_inode_cachep = kmem_cache_create("adfs_inode_cache",
sizeof(struct adfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (adfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/affs/super.c 2006-02-26 17:53:50.546986722 -0800
+++ 2.6.16-rc4-mm2/fs/affs/super.c 2006-02-26 18:40:51.431942693 -0800
@@ -98,7 +98,7 @@ static int init_inodecache(void)
{
affs_inode_cachep = kmem_cache_create("affs_inode_cache",
sizeof(struct affs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (affs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/befs/linuxvfs.c 2006-02-26 17:53:51.230588018 -0800
+++ 2.6.16-rc4-mm2/fs/befs/linuxvfs.c 2006-02-26 18:40:51.436825560 -0800
@@ -427,7 +427,7 @@ befs_init_inodecache(void)
{
befs_inode_cachep = kmem_cache_create("befs_inode_cache",
sizeof (struct befs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (befs_inode_cachep == NULL) {
printk(KERN_ERR "befs_init_inodecache: "
--- 2.6.16-rc4-mm2.orig/fs/bfs/inode.c 2006-02-26 17:53:51.320432760 -0800
+++ 2.6.16-rc4-mm2/fs/bfs/inode.c 2006-02-26 18:40:51.440731854 -0800
@@ -257,7 +257,7 @@ static int init_inodecache(void)
{
bfs_inode_cachep = kmem_cache_create("bfs_inode_cache",
sizeof(struct bfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (bfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/block_dev.c 2006-02-26 18:32:36.363896034 -0800
+++ 2.6.16-rc4-mm2/fs/block_dev.c 2006-02-26 18:40:51.441708427 -0800
@@ -319,7 +319,7 @@ void __init bdev_cache_init(void)
{
int err;
bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_PANIC,
init_once, NULL);
err = register_filesystem(&bd_type);
if (err)
--- 2.6.16-rc4-mm2.orig/fs/cifs/cifsfs.c 2006-02-26 18:32:13.289418922 -0800
+++ 2.6.16-rc4-mm2/fs/cifs/cifsfs.c 2006-02-26 18:40:51.444638148 -0800
@@ -692,7 +692,7 @@ cifs_init_inodecache(void)
{
cifs_inode_cachep = kmem_cache_create("cifs_inode_cache",
sizeof (struct cifsInodeInfo),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
cifs_init_once, NULL);
if (cifs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/coda/inode.c 2006-02-26 18:32:39.268225430 -0800
+++ 2.6.16-rc4-mm2/fs/coda/inode.c 2006-02-26 18:40:51.447567868 -0800
@@ -71,7 +71,7 @@ int coda_init_inodecache(void)
{
coda_inode_cachep = kmem_cache_create("coda_inode_cache",
sizeof(struct coda_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (coda_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/dquot.c 2006-02-26 18:32:36.401982398 -0800
+++ 2.6.16-rc4-mm2/fs/dquot.c 2006-02-26 18:40:51.449521015 -0800
@@ -1817,7 +1817,7 @@ static int __init dquot_init(void)

dquot_cachep = kmem_cache_create("dquot",
sizeof(struct dquot), sizeof(unsigned long) * 4,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
+ SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_PANIC,
NULL, NULL);

order = 0;
--- 2.6.16-rc4-mm2.orig/fs/efs/super.c 2006-02-26 17:53:54.602695556 -0800
+++ 2.6.16-rc4-mm2/fs/efs/super.c 2006-02-26 18:40:51.451474162 -0800
@@ -81,7 +81,7 @@ static int init_inodecache(void)
{
efs_inode_cachep = kmem_cache_create("efs_inode_cache",
sizeof(struct efs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (efs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ext2/super.c 2006-02-26 18:32:39.237951654 -0800
+++ 2.6.16-rc4-mm2/fs/ext2/super.c 2006-02-26 18:40:51.453427309 -0800
@@ -175,7 +175,7 @@ static int init_inodecache(void)
{
ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
sizeof(struct ext2_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (ext2_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ext3/super.c 2006-02-26 18:32:37.414689050 -0800
+++ 2.6.16-rc4-mm2/fs/ext3/super.c 2006-02-26 18:40:51.458310176 -0800
@@ -481,7 +481,7 @@ static int init_inodecache(void)
{
ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
sizeof(struct ext3_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (ext3_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/fat/cache.c 2006-02-26 17:53:55.388837047 -0800
+++ 2.6.16-rc4-mm2/fs/fat/cache.c 2006-02-26 18:40:51.460263323 -0800
@@ -49,7 +49,7 @@ int __init fat_cache_init(void)
{
fat_cache_cachep = kmem_cache_create("fat_cache",
sizeof(struct fat_cache),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (fat_cache_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/fat/inode.c 2006-02-26 17:53:55.446454870 -0800
+++ 2.6.16-rc4-mm2/fs/fat/inode.c 2006-02-26 18:40:51.462216470 -0800
@@ -518,7 +518,7 @@ static int __init fat_init_inodecache(vo
{
fat_inode_cachep = kmem_cache_create("fat_inode_cache",
sizeof(struct msdos_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (fat_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/freevxfs/vxfs_super.c 2006-02-26 17:53:55.733567415 -0800
+++ 2.6.16-rc4-mm2/fs/freevxfs/vxfs_super.c 2006-02-26 18:40:51.465146190 -0800
@@ -260,7 +260,7 @@ vxfs_init(void)
{
vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
sizeof(struct vxfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT, NULL, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL, NULL);
if (vxfs_inode_cachep)
return register_filesystem(&vxfs_fs_type);
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/hpfs/super.c 2006-02-26 18:32:37.378555833 -0800
+++ 2.6.16-rc4-mm2/fs/hpfs/super.c 2006-02-26 18:40:51.467099337 -0800
@@ -191,7 +191,7 @@ static int init_inodecache(void)
{
hpfs_inode_cachep = kmem_cache_create("hpfs_inode_cache",
sizeof(struct hpfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (hpfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/isofs/inode.c 2006-02-26 17:53:57.740425507 -0800
+++ 2.6.16-rc4-mm2/fs/isofs/inode.c 2006-02-26 18:40:51.471005631 -0800
@@ -87,7 +87,7 @@ static int init_inodecache(void)
{
isofs_inode_cachep = kmem_cache_create("isofs_inode_cache",
sizeof(struct iso_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (isofs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/jffs/inode-v23.c 2006-02-26 18:32:37.170545692 -0800
+++ 2.6.16-rc4-mm2/fs/jffs/inode-v23.c 2006-02-26 18:40:51.473935351 -0800
@@ -1812,14 +1812,14 @@ init_jffs_fs(void)
}
#endif
fm_cache = kmem_cache_create("jffs_fm", sizeof(struct jffs_fm),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
NULL, NULL);
if (!fm_cache) {
return -ENOMEM;
}

node_cache = kmem_cache_create("jffs_node",sizeof(struct jffs_node),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
NULL, NULL);
if (!node_cache) {
kmem_cache_destroy(fm_cache);
--- 2.6.16-rc4-mm2.orig/fs/jffs2/super.c 2006-02-26 17:53:59.195519696 -0800
+++ 2.6.16-rc4-mm2/fs/jffs2/super.c 2006-02-26 18:40:51.477841645 -0800
@@ -331,7 +331,7 @@ static int __init init_jffs2_fs(void)

jffs2_inode_cachep = kmem_cache_create("jffs2_i",
sizeof(struct jffs2_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
jffs2_i_init_once, NULL);
if (!jffs2_inode_cachep) {
printk(KERN_ERR "JFFS2 error: Failed to initialise inode cache\n");
--- 2.6.16-rc4-mm2.orig/fs/jfs/super.c 2006-02-26 18:32:16.663480138 -0800
+++ 2.6.16-rc4-mm2/fs/jfs/super.c 2006-02-26 18:40:51.479794792 -0800
@@ -634,7 +634,7 @@ static int __init init_jfs_fs(void)

jfs_inode_cachep =
kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT, init_once, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, init_once, NULL);
if (jfs_inode_cachep == NULL)
return -ENOMEM;

--- 2.6.16-rc4-mm2.orig/fs/mbcache.c 2006-02-26 17:53:59.930879377 -0800
+++ 2.6.16-rc4-mm2/fs/mbcache.c 2006-02-26 18:40:51.480771365 -0800
@@ -288,7 +288,7 @@ mb_cache_create(const char *name, struct
INIT_LIST_HEAD(&cache->c_indexes_hash[m][n]);
}
cache->c_entry_cache = kmem_cache_create(name, entry_size, 0,
- SLAB_RECLAIM_ACCOUNT, NULL, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL, NULL);
if (!cache->c_entry_cache)
goto fail;

--- 2.6.16-rc4-mm2.orig/fs/minix/inode.c 2006-02-26 17:53:59.975801748 -0800
+++ 2.6.16-rc4-mm2/fs/minix/inode.c 2006-02-26 18:40:51.508115422 -0800
@@ -80,7 +80,7 @@ static int init_inodecache(void)
{
minix_inode_cachep = kmem_cache_create("minix_inode_cache",
sizeof(struct minix_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (minix_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ncpfs/inode.c 2006-02-26 18:32:37.448869121 -0800
+++ 2.6.16-rc4-mm2/fs/ncpfs/inode.c 2006-02-26 18:40:51.512021716 -0800
@@ -72,7 +72,7 @@ static int init_inodecache(void)
{
ncp_inode_cachep = kmem_cache_create("ncp_inode_cache",
sizeof(struct ncp_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (ncp_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/nfs/direct.c 2006-02-26 17:54:00.385962526 -0800
+++ 2.6.16-rc4-mm2/fs/nfs/direct.c 2006-02-26 18:40:51.513974863 -0800
@@ -771,7 +771,7 @@ int nfs_init_directcache(void)
{
nfs_direct_cachep = kmem_cache_create("nfs_direct_cache",
sizeof(struct nfs_direct_req),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
NULL, NULL);
if (nfs_direct_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/nfs/inode.c 2006-02-26 18:32:38.669585914 -0800
+++ 2.6.16-rc4-mm2/fs/nfs/inode.c 2006-02-26 18:40:51.515928009 -0800
@@ -2163,7 +2163,7 @@ static int nfs_init_inodecache(void)
{
nfs_inode_cachep = kmem_cache_create("nfs_inode_cache",
sizeof(struct nfs_inode),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (nfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ntfs/super.c 2006-02-26 18:32:37.234022965 -0800
+++ 2.6.16-rc4-mm2/fs/ntfs/super.c 2006-02-26 18:40:51.518857730 -0800
@@ -3084,7 +3084,7 @@ static int __init init_ntfs_fs(void)

ntfs_inode_cache = kmem_cache_create(ntfs_inode_cache_name,
sizeof(ntfs_inode), 0,
- SLAB_RECLAIM_ACCOUNT, NULL, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL, NULL);
if (!ntfs_inode_cache) {
printk(KERN_CRIT "NTFS: Failed to create %s!\n",
ntfs_inode_cache_name);
@@ -3093,7 +3093,7 @@ static int __init init_ntfs_fs(void)

ntfs_big_inode_cache = kmem_cache_create(ntfs_big_inode_cache_name,
sizeof(big_ntfs_inode), 0,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
ntfs_big_inode_init_once, NULL);
if (!ntfs_big_inode_cache) {
printk(KERN_CRIT "NTFS: Failed to create %s!\n",
--- 2.6.16-rc4-mm2.orig/fs/ocfs2/dlm/dlmfs.c 2006-02-26 17:54:04.293232225 -0800
+++ 2.6.16-rc4-mm2/fs/ocfs2/dlm/dlmfs.c 2006-02-26 18:40:51.521787450 -0800
@@ -596,7 +596,7 @@ static int __init init_dlmfs_fs(void)

dlmfs_inode_cache = kmem_cache_create("dlmfs_inode_cache",
sizeof(struct dlmfs_inode_private),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
dlmfs_init_once, NULL);
if (!dlmfs_inode_cache)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ocfs2/super.c 2006-02-26 18:32:31.194892844 -0800
+++ 2.6.16-rc4-mm2/fs/ocfs2/super.c 2006-02-26 18:40:51.523740597 -0800
@@ -951,7 +951,7 @@ static int ocfs2_initialize_mem_caches(v
{
ocfs2_inode_cachep = kmem_cache_create("ocfs2_inode_cache",
sizeof(struct ocfs2_inode_info),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
ocfs2_inode_init_once, NULL);
if (!ocfs2_inode_cachep)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/proc/inode.c 2006-02-26 18:31:43.848659613 -0800
+++ 2.6.16-rc4-mm2/fs/proc/inode.c 2006-02-26 18:40:51.526670317 -0800
@@ -121,7 +121,7 @@ int __init proc_init_inodecache(void)
{
proc_inode_cachep = kmem_cache_create("proc_inode_cache",
sizeof(struct proc_inode),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (proc_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/qnx4/inode.c 2006-02-26 17:54:05.782506480 -0800
+++ 2.6.16-rc4-mm2/fs/qnx4/inode.c 2006-02-26 18:40:51.529600038 -0800
@@ -546,7 +546,7 @@ static int init_inodecache(void)
{
qnx4_inode_cachep = kmem_cache_create("qnx4_inode_cache",
sizeof(struct qnx4_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (qnx4_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/reiserfs/super.c 2006-02-26 17:54:06.364544156 -0800
+++ 2.6.16-rc4-mm2/fs/reiserfs/super.c 2006-02-26 18:40:51.534482905 -0800
@@ -521,7 +521,7 @@ static int init_inodecache(void)
reiserfs_inode_cachep = kmem_cache_create("reiser_inode_cache",
sizeof(struct
reiserfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (reiserfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/romfs/inode.c 2006-02-26 17:54:06.543257067 -0800
+++ 2.6.16-rc4-mm2/fs/romfs/inode.c 2006-02-26 18:40:51.536436052 -0800
@@ -579,7 +579,7 @@ static int init_inodecache(void)
{
romfs_inode_cachep = kmem_cache_create("romfs_inode_cache",
sizeof(struct romfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (romfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/smbfs/inode.c 2006-02-26 17:54:06.662399007 -0800
+++ 2.6.16-rc4-mm2/fs/smbfs/inode.c 2006-02-26 18:40:51.539365772 -0800
@@ -80,7 +80,7 @@ static int init_inodecache(void)
{
smb_inode_cachep = kmem_cache_create("smb_inode_cache",
sizeof(struct smb_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (smb_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/sysv/inode.c 2006-02-26 17:54:08.139954383 -0800
+++ 2.6.16-rc4-mm2/fs/sysv/inode.c 2006-02-26 18:40:51.544248639 -0800
@@ -342,7 +342,7 @@ int __init sysv_init_icache(void)
{
sysv_inode_cachep = kmem_cache_create("sysv_inode_cache",
sizeof(struct sysv_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (!sysv_inode_cachep)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/udf/super.c 2006-02-26 18:32:38.690093957 -0800
+++ 2.6.16-rc4-mm2/fs/udf/super.c 2006-02-26 18:40:51.545225213 -0800
@@ -140,7 +140,7 @@ static int init_inodecache(void)
{
udf_inode_cachep = kmem_cache_create("udf_inode_cache",
sizeof(struct udf_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (udf_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ufs/super.c 2006-02-26 17:54:08.721015486 -0800
+++ 2.6.16-rc4-mm2/fs/ufs/super.c 2006-02-26 18:40:51.548154933 -0800
@@ -1184,7 +1184,7 @@ static int init_inodecache(void)
{
ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
sizeof(struct ufs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (ufs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/net/socket.c 2006-02-26 18:32:21.592246261 -0800
+++ 2.6.16-rc4-mm2/net/socket.c 2006-02-26 18:40:51.550108080 -0800
@@ -319,7 +319,7 @@ static int init_inodecache(void)
{
sock_inode_cachep = kmem_cache_create("sock_inode_cache",
sizeof(struct socket_alloc),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (sock_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/net/sunrpc/rpc_pipe.c 2006-02-26 17:56:16.829852669 -0800
+++ 2.6.16-rc4-mm2/net/sunrpc/rpc_pipe.c 2006-02-26 18:40:51.554990947 -0800
@@ -850,7 +850,7 @@ int register_rpc_pipefs(void)
{
rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
sizeof(struct rpc_inode),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
init_once, NULL);
if (!rpc_inode_cachep)
return -ENOMEM;

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2006-02-27 07:02:30

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 02/02] cpuset memory spread slab cache format

From: Paul Jackson <[email protected]>

Rewrap the overly long source code lines resulting
from the previous patch's addition of the slab
cache flag SLAB_MEM_SPREAD. This patch contains only
formatting changes, and no function change.

Signed-off-by: Paul Jackson <[email protected]>

---

fs/adfs/super.c | 3 ++-
fs/affs/super.c | 3 ++-
fs/befs/linuxvfs.c | 3 ++-
fs/bfs/inode.c | 3 ++-
fs/block_dev.c | 3 ++-
fs/cifs/cifsfs.c | 3 ++-
fs/dquot.c | 3 ++-
fs/ext2/super.c | 3 ++-
fs/ext3/super.c | 3 ++-
fs/fat/inode.c | 3 ++-
fs/hpfs/super.c | 3 ++-
fs/isofs/inode.c | 3 ++-
fs/jffs/inode-v23.c | 10 ++++++----
fs/jffs2/super.c | 3 ++-
fs/jfs/super.c | 3 ++-
fs/minix/inode.c | 3 ++-
fs/ncpfs/inode.c | 3 ++-
fs/nfs/direct.c | 3 ++-
fs/nfs/inode.c | 3 ++-
fs/ocfs2/dlm/dlmfs.c | 3 ++-
fs/ocfs2/super.c | 8 +++++---
fs/proc/inode.c | 3 ++-
fs/qnx4/inode.c | 3 ++-
fs/reiserfs/super.c | 3 ++-
fs/romfs/inode.c | 3 ++-
fs/smbfs/inode.c | 3 ++-
fs/udf/super.c | 3 ++-
fs/ufs/super.c | 3 ++-
net/socket.c | 3 ++-
net/sunrpc/rpc_pipe.c | 7 ++++---
30 files changed, 69 insertions(+), 37 deletions(-)

--- 2.6.16-rc4-mm2.orig/fs/adfs/super.c 2006-02-26 18:40:51.420223812 -0800
+++ 2.6.16-rc4-mm2/fs/adfs/super.c 2006-02-26 18:40:56.873409942 -0800
@@ -241,7 +241,8 @@ static int init_inodecache(void)
{
adfs_inode_cachep = kmem_cache_create("adfs_inode_cache",
sizeof(struct adfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (adfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/affs/super.c 2006-02-26 18:40:51.431942693 -0800
+++ 2.6.16-rc4-mm2/fs/affs/super.c 2006-02-26 18:40:56.891964837 -0800
@@ -98,7 +98,8 @@ static int init_inodecache(void)
{
affs_inode_cachep = kmem_cache_create("affs_inode_cache",
sizeof(struct affs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (affs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/befs/linuxvfs.c 2006-02-26 18:40:51.436825560 -0800
+++ 2.6.16-rc4-mm2/fs/befs/linuxvfs.c 2006-02-26 18:40:56.896847704 -0800
@@ -427,7 +427,8 @@ befs_init_inodecache(void)
{
befs_inode_cachep = kmem_cache_create("befs_inode_cache",
sizeof (struct befs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (befs_inode_cachep == NULL) {
printk(KERN_ERR "befs_init_inodecache: "
--- 2.6.16-rc4-mm2.orig/fs/bfs/inode.c 2006-02-26 18:40:51.440731854 -0800
+++ 2.6.16-rc4-mm2/fs/bfs/inode.c 2006-02-26 18:40:56.899777425 -0800
@@ -257,7 +257,8 @@ static int init_inodecache(void)
{
bfs_inode_cachep = kmem_cache_create("bfs_inode_cache",
sizeof(struct bfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (bfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/block_dev.c 2006-02-26 18:40:51.441708427 -0800
+++ 2.6.16-rc4-mm2/fs/block_dev.c 2006-02-26 18:40:56.901730572 -0800
@@ -319,7 +319,8 @@ void __init bdev_cache_init(void)
{
int err;
bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_PANIC,
+ 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|SLAB_PANIC),
init_once, NULL);
err = register_filesystem(&bd_type);
if (err)
--- 2.6.16-rc4-mm2.orig/fs/cifs/cifsfs.c 2006-02-26 18:40:51.444638148 -0800
+++ 2.6.16-rc4-mm2/fs/cifs/cifsfs.c 2006-02-26 18:40:56.915402600 -0800
@@ -692,7 +692,8 @@ cifs_init_inodecache(void)
{
cifs_inode_cachep = kmem_cache_create("cifs_inode_cache",
sizeof (struct cifsInodeInfo),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
cifs_init_once, NULL);
if (cifs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/dquot.c 2006-02-26 18:40:51.449521015 -0800
+++ 2.6.16-rc4-mm2/fs/dquot.c 2006-02-26 18:40:56.917355747 -0800
@@ -1817,7 +1817,8 @@ static int __init dquot_init(void)

dquot_cachep = kmem_cache_create("dquot",
sizeof(struct dquot), sizeof(unsigned long) * 4,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_PANIC,
+ (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD|SLAB_PANIC),
NULL, NULL);

order = 0;
--- 2.6.16-rc4-mm2.orig/fs/ext2/super.c 2006-02-26 18:40:51.453427309 -0800
+++ 2.6.16-rc4-mm2/fs/ext2/super.c 2006-02-26 18:40:56.920285467 -0800
@@ -175,7 +175,8 @@ static int init_inodecache(void)
{
ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
sizeof(struct ext2_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (ext2_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ext3/super.c 2006-02-26 18:40:51.458310176 -0800
+++ 2.6.16-rc4-mm2/fs/ext3/super.c 2006-02-26 18:40:56.924191761 -0800
@@ -481,7 +481,8 @@ static int init_inodecache(void)
{
ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
sizeof(struct ext3_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (ext3_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/fat/inode.c 2006-02-26 18:40:51.462216470 -0800
+++ 2.6.16-rc4-mm2/fs/fat/inode.c 2006-02-26 18:40:56.927121481 -0800
@@ -518,7 +518,8 @@ static int __init fat_init_inodecache(vo
{
fat_inode_cachep = kmem_cache_create("fat_inode_cache",
sizeof(struct msdos_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (fat_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/hpfs/super.c 2006-02-26 18:40:51.467099337 -0800
+++ 2.6.16-rc4-mm2/fs/hpfs/super.c 2006-02-26 18:40:56.930051202 -0800
@@ -191,7 +191,8 @@ static int init_inodecache(void)
{
hpfs_inode_cachep = kmem_cache_create("hpfs_inode_cache",
sizeof(struct hpfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (hpfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/isofs/inode.c 2006-02-26 18:40:51.471005631 -0800
+++ 2.6.16-rc4-mm2/fs/isofs/inode.c 2006-02-26 18:40:56.932980922 -0800
@@ -87,7 +87,8 @@ static int init_inodecache(void)
{
isofs_inode_cachep = kmem_cache_create("isofs_inode_cache",
sizeof(struct iso_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (isofs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/jffs/inode-v23.c 2006-02-26 18:40:51.473935351 -0800
+++ 2.6.16-rc4-mm2/fs/jffs/inode-v23.c 2006-02-26 18:40:56.936887216 -0800
@@ -1812,15 +1812,17 @@ init_jffs_fs(void)
}
#endif
fm_cache = kmem_cache_create("jffs_fm", sizeof(struct jffs_fm),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
- NULL, NULL);
+ 0,
+ SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ NULL, NULL);
if (!fm_cache) {
return -ENOMEM;
}

node_cache = kmem_cache_create("jffs_node",sizeof(struct jffs_node),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
- NULL, NULL);
+ 0,
+ SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ NULL, NULL);
if (!node_cache) {
kmem_cache_destroy(fm_cache);
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/jffs2/super.c 2006-02-26 18:40:51.477841645 -0800
+++ 2.6.16-rc4-mm2/fs/jffs2/super.c 2006-02-26 18:40:56.939816936 -0800
@@ -331,7 +331,8 @@ static int __init init_jffs2_fs(void)

jffs2_inode_cachep = kmem_cache_create("jffs2_i",
sizeof(struct jffs2_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
jffs2_i_init_once, NULL);
if (!jffs2_inode_cachep) {
printk(KERN_ERR "JFFS2 error: Failed to initialise inode cache\n");
--- 2.6.16-rc4-mm2.orig/fs/jfs/super.c 2006-02-26 18:40:51.479794792 -0800
+++ 2.6.16-rc4-mm2/fs/jfs/super.c 2006-02-26 18:40:56.943723230 -0800
@@ -634,7 +634,8 @@ static int __init init_jfs_fs(void)

jfs_inode_cachep =
kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, init_once, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ init_once, NULL);
if (jfs_inode_cachep == NULL)
return -ENOMEM;

--- 2.6.16-rc4-mm2.orig/fs/minix/inode.c 2006-02-26 18:40:51.508115422 -0800
+++ 2.6.16-rc4-mm2/fs/minix/inode.c 2006-02-26 18:40:56.984739315 -0800
@@ -80,7 +80,8 @@ static int init_inodecache(void)
{
minix_inode_cachep = kmem_cache_create("minix_inode_cache",
sizeof(struct minix_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (minix_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ncpfs/inode.c 2006-02-26 18:40:51.512021716 -0800
+++ 2.6.16-rc4-mm2/fs/ncpfs/inode.c 2006-02-26 18:40:56.988645608 -0800
@@ -72,7 +72,8 @@ static int init_inodecache(void)
{
ncp_inode_cachep = kmem_cache_create("ncp_inode_cache",
sizeof(struct ncp_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (ncp_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/nfs/direct.c 2006-02-26 18:40:51.513974863 -0800
+++ 2.6.16-rc4-mm2/fs/nfs/direct.c 2006-02-26 18:40:56.990598755 -0800
@@ -771,7 +771,8 @@ int nfs_init_directcache(void)
{
nfs_direct_cachep = kmem_cache_create("nfs_direct_cache",
sizeof(struct nfs_direct_req),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
NULL, NULL);
if (nfs_direct_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/nfs/inode.c 2006-02-26 18:40:51.515928009 -0800
+++ 2.6.16-rc4-mm2/fs/nfs/inode.c 2006-02-26 18:40:56.991575329 -0800
@@ -2163,7 +2163,8 @@ static int nfs_init_inodecache(void)
{
nfs_inode_cachep = kmem_cache_create("nfs_inode_cache",
sizeof(struct nfs_inode),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (nfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ocfs2/dlm/dlmfs.c 2006-02-26 18:40:51.521787450 -0800
+++ 2.6.16-rc4-mm2/fs/ocfs2/dlm/dlmfs.c 2006-02-26 18:40:56.995481623 -0800
@@ -596,7 +596,8 @@ static int __init init_dlmfs_fs(void)

dlmfs_inode_cache = kmem_cache_create("dlmfs_inode_cache",
sizeof(struct dlmfs_inode_private),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
dlmfs_init_once, NULL);
if (!dlmfs_inode_cache)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ocfs2/super.c 2006-02-26 18:40:51.523740597 -0800
+++ 2.6.16-rc4-mm2/fs/ocfs2/super.c 2006-02-26 18:40:56.999387916 -0800
@@ -950,9 +950,11 @@ static void ocfs2_inode_init_once(void *
static int ocfs2_initialize_mem_caches(void)
{
ocfs2_inode_cachep = kmem_cache_create("ocfs2_inode_cache",
- sizeof(struct ocfs2_inode_info),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
- ocfs2_inode_init_once, NULL);
+ sizeof(struct ocfs2_inode_info),
+ 0,
+ (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
+ ocfs2_inode_init_once, NULL);
if (!ocfs2_inode_cachep)
return -ENOMEM;

--- 2.6.16-rc4-mm2.orig/fs/proc/inode.c 2006-02-26 18:40:51.526670317 -0800
+++ 2.6.16-rc4-mm2/fs/proc/inode.c 2006-02-26 18:40:57.001341063 -0800
@@ -121,7 +121,8 @@ int __init proc_init_inodecache(void)
{
proc_inode_cachep = kmem_cache_create("proc_inode_cache",
sizeof(struct proc_inode),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (proc_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/qnx4/inode.c 2006-02-26 18:40:51.529600038 -0800
+++ 2.6.16-rc4-mm2/fs/qnx4/inode.c 2006-02-26 18:40:57.005247357 -0800
@@ -546,7 +546,8 @@ static int init_inodecache(void)
{
qnx4_inode_cachep = kmem_cache_create("qnx4_inode_cache",
sizeof(struct qnx4_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (qnx4_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/reiserfs/super.c 2006-02-26 18:40:51.534482905 -0800
+++ 2.6.16-rc4-mm2/fs/reiserfs/super.c 2006-02-26 18:40:57.009153651 -0800
@@ -521,7 +521,8 @@ static int init_inodecache(void)
reiserfs_inode_cachep = kmem_cache_create("reiser_inode_cache",
sizeof(struct
reiserfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (reiserfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/romfs/inode.c 2006-02-26 18:40:51.536436052 -0800
+++ 2.6.16-rc4-mm2/fs/romfs/inode.c 2006-02-26 18:40:57.012083371 -0800
@@ -579,7 +579,8 @@ static int init_inodecache(void)
{
romfs_inode_cachep = kmem_cache_create("romfs_inode_cache",
sizeof(struct romfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (romfs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/smbfs/inode.c 2006-02-26 18:40:51.539365772 -0800
+++ 2.6.16-rc4-mm2/fs/smbfs/inode.c 2006-02-26 18:40:57.016966238 -0800
@@ -80,7 +80,8 @@ static int init_inodecache(void)
{
smb_inode_cachep = kmem_cache_create("smb_inode_cache",
sizeof(struct smb_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (smb_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/udf/super.c 2006-02-26 18:40:51.545225213 -0800
+++ 2.6.16-rc4-mm2/fs/udf/super.c 2006-02-26 18:40:57.018919385 -0800
@@ -140,7 +140,8 @@ static int init_inodecache(void)
{
udf_inode_cachep = kmem_cache_create("udf_inode_cache",
sizeof(struct udf_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (udf_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/fs/ufs/super.c 2006-02-26 18:40:51.548154933 -0800
+++ 2.6.16-rc4-mm2/fs/ufs/super.c 2006-02-26 18:40:57.022825679 -0800
@@ -1184,7 +1184,8 @@ static int init_inodecache(void)
{
ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
sizeof(struct ufs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (ufs_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/net/socket.c 2006-02-26 18:40:51.550108080 -0800
+++ 2.6.16-rc4-mm2/net/socket.c 2006-02-26 18:40:57.024778826 -0800
@@ -319,7 +319,8 @@ static int init_inodecache(void)
{
sock_inode_cachep = kmem_cache_create("sock_inode_cache",
sizeof(struct socket_alloc),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
init_once, NULL);
if (sock_inode_cachep == NULL)
return -ENOMEM;
--- 2.6.16-rc4-mm2.orig/net/sunrpc/rpc_pipe.c 2006-02-26 18:40:51.554990947 -0800
+++ 2.6.16-rc4-mm2/net/sunrpc/rpc_pipe.c 2006-02-26 18:40:57.027708546 -0800
@@ -849,9 +849,10 @@ init_once(void * foo, kmem_cache_t * cac
int register_rpc_pipefs(void)
{
rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
- sizeof(struct rpc_inode),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
- init_once, NULL);
+ sizeof(struct rpc_inode),
+ 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
+ SLAB_MEM_SPREAD),
+ init_once, NULL);
if (!rpc_inode_cachep)
return -ENOMEM;
register_filesystem(&rpc_pipe_fs_type);

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2006-02-27 19:34:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

Paul Jackson <[email protected]> writes:

> From: Paul Jackson <[email protected]>
>
> Mark file system inode and similar slab caches subject to
> SLAB_MEM_SPREAD memory spreading.
>
> If a slab cache is marked SLAB_MEM_SPREAD, then anytime that
> a task that's in a cpuset with the 'memory_spread_slab' option
> enabled

Is there a way to use it without cpumemsets?

I would assume it's useful for smaller machines too, but they
generally don't use cpumemsets.
-Andi

2006-02-27 20:16:25

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

Andi wrote:
> Is there a way to use it without cpumemsets?

Not that I know of. So far as I can recall, the task->mems_allowed
field (over which the spreading is done) is only manipulated by the
cpuset code. So at least what I have here requires cpusets to have
any useful affect, yes.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-27 20:37:15

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Mon, 27 Feb 2006, Paul Jackson wrote:

> Andi wrote:
> > Is there a way to use it without cpumemsets?
>
> Not that I know of. So far as I can recall, the task->mems_allowed
> field (over which the spreading is done) is only manipulated by the
> cpuset code. So at least what I have here requires cpusets to have
> any useful affect, yes.

Well we could fall back to interleave on the node online map if some sort
of flag is set.

On the other hand setting memory policy to MPOL_INTERLEAVE already
provides that functionality.


2006-02-27 20:50:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Monday 27 February 2006 21:36, Christoph Lameter wrote:

> On the other hand setting memory policy to MPOL_INTERLEAVE already
> provides that functionality.

Yes, but not selective for these slabs caches. I think it would be useful
if we could interleave inodes/dentries but still keep a local policy for
normal program memory.

-Andi

2006-02-27 20:56:53

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Mon, 27 Feb 2006, Andi Kleen wrote:

> On Monday 27 February 2006 21:36, Christoph Lameter wrote:
>
> > On the other hand setting memory policy to MPOL_INTERLEAVE already
> > provides that functionality.
>
> Yes, but not selective for these slabs caches. I think it would be useful
> if we could interleave inodes/dentries but still keep a local policy for
> normal program memory.

We could make the memory policy only apply if the SLAB_MEM_SPREAD option
is set:

Index: linux-2.6.16-rc4-mm2/mm/slab.c
===================================================================
--- linux-2.6.16-rc4-mm2.orig/mm/slab.c 2006-02-24 10:33:54.000000000 -0800
+++ linux-2.6.16-rc4-mm2/mm/slab.c 2006-02-27 12:54:52.000000000 -0800
@@ -2871,7 +2871,9 @@ static void *alternate_node_alloc(struct
if (in_interrupt())
return NULL;
nid_alloc = nid_here = numa_node_id();
- if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
+ if (!cachep->flags & SLAB_MEM_SPREAD)
+ return NULL;
+ if (cpuset_do_slab_mem_spread())
nid_alloc = cpuset_mem_spread_node();
else if (current->mempolicy)
nid_alloc = slab_node(current->mempolicy);

2006-02-27 21:53:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Monday 27 February 2006 21:56, Christoph Lameter wrote:

>
> We could make the memory policy only apply if the SLAB_MEM_SPREAD option
> is set:

Which memory policy? The one of the process?

> Index: linux-2.6.16-rc4-mm2/mm/slab.c
> ===================================================================
> --- linux-2.6.16-rc4-mm2.orig/mm/slab.c 2006-02-24 10:33:54.000000000 -0800
> +++ linux-2.6.16-rc4-mm2/mm/slab.c 2006-02-27 12:54:52.000000000 -0800
> @@ -2871,7 +2871,9 @@ static void *alternate_node_alloc(struct
> if (in_interrupt())
> return NULL;
> nid_alloc = nid_here = numa_node_id();
> - if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
> + if (!cachep->flags & SLAB_MEM_SPREAD)

brackets missing I guess.

> + return NULL;
> + if (cpuset_do_slab_mem_spread())
> nid_alloc = cpuset_mem_spread_node();
> else if (current->mempolicy)
> nid_alloc = slab_node(current->mempolicy);
>

-Andi

2006-02-27 22:15:06

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Mon, 27 Feb 2006, Andi Kleen wrote:

> On Monday 27 February 2006 21:56, Christoph Lameter wrote:
> > We could make the memory policy only apply if the SLAB_MEM_SPREAD option
> Which memory policy? The one of the process?

Yes.

2006-02-27 22:40:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Monday 27 February 2006 23:14, Christoph Lameter wrote:
> On Mon, 27 Feb 2006, Andi Kleen wrote:
>
> > On Monday 27 February 2006 21:56, Christoph Lameter wrote:
> > > We could make the memory policy only apply if the SLAB_MEM_SPREAD option
> > Which memory policy? The one of the process?
>
> Yes.

I don't quite get your logic here. For me it would be logical to apply
the memory policy from the process for anything _but_ slab caches
that have SLAB_MEM_SPREAD set. For those interleaving should be always used.

Why are you proposing to do it the other way round?

-Andi

2006-02-27 23:13:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Mon, 27 Feb 2006, Andi Kleen wrote:

> I don't quite get your logic here. For me it would be logical to apply
> the memory policy from the process for anything _but_ slab caches
> that have SLAB_MEM_SPREAD set. For those interleaving should be always used.

Interleaving is a special feature to be used only if we know that the
objects are used in a system wide fashion. Interleave should never be the
default option.

F.e. inode interleaving for a process that is running on one
node and scanning files will reduce performance. This is the typical
case.

On the other hand if files are used by multiple processes in a cpuset then
interleaving may be beneficial.

> Why are you proposing to do it the other way round?

Because these are the types of objects that may benefit from
interleaving. Other slabs need to always be allocated in a node local
way.

2006-02-28 01:56:18

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

Hmmm ... your thread with Andi confuses me ...

Oh well.

I take it that Andi is suggesting that there be the option to override
the tasks mempolicy, in the particular case of these file i/o slab
caches, with an interleave over the online nodes.

This option would be useful in the case that a system is not using
cpusets, but still wants to spread out these particular (sometimes
large) file i/o caches.

Questions for Andi:

1) Are you content to have such a interleave of these particular file
i/o slabs triggered by a mm/mempolicy.c option? Or do you think
we need some sort of task external API to invoke this policy?

If mempolicy API works, then I would think that someone, such as
yourself or Christoph, could easily enough propose an extension to
the mempolicy API that invoked such a policy. It could leverage
some of the apparatus already provided by my current patchset here,
such as the per-slab SLAB_MEMP_SPREAD flag settings, the task
PF_* flag bits and the hook in ____cache_alloc() to call out to
alternate_node_alloc().

If a system wide API (that can be externally imposed on some or
all tasks from outside the task) is desirable, then I am left
wondering why you don't use cpusets for this.

2) Do you recommend that the page (file buffer) cache also be
interleavable, across all online nodes, if optionally requested,
on systems not using cpusets?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-02-28 17:14:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Tuesday 28 February 2006 02:56, Paul Jackson wrote:
> Hmmm ... your thread with Andi confuses me ...
>
> Oh well.
>
> I take it that Andi is suggesting that there be the option to override
> the tasks mempolicy, in the particular case of these file i/o slab
> caches, with an interleave over the online nodes.

Yep exactly.

> This option would be useful in the case that a system is not using
> cpusets, but still wants to spread out these particular (sometimes
> large) file i/o caches.

Yep.

> Questions for Andi:
>
> 1) Are you content to have such a interleave of these particular file
> i/o slabs triggered by a mm/mempolicy.c option? Or do you think
> we need some sort of task external API to invoke this policy?

Task external. mempolicy.c has no good way to handle multiple policies
like this. I was thinking of a simple sysctl

I guess I can cook up a patch once your code is merged.


> 2) Do you recommend that the page (file buffer) cache also be
> interleavable, across all online nodes, if optionally requested,
> on systems not using cpusets?

Yes, but as a separate option.

-Andi

2006-03-01 18:28:12

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

> > 1) Are you content to have such a interleave of these particular file
> > i/o slabs triggered by a mm/mempolicy.c option? Or do you think
> > we need some sort of task external API to invoke this policy?
>
> Task external. mempolicy.c has no good way to handle multiple policies
> like this. I was thinking of a simple sysctl

No need to implement a sysctl for this. The current cpuset facility
should provide just what you want, if I am understanding correctly.

It would be really easy. Run with a kernel that has cpusets configured
in. One time at boot, enable memory spreading for these slabs:
test -d /dev/cpuset || mkdir /dev/cpuset
mount -t cpuset cpuset /dev/cpuset
echo 1 > /dev/cpuset/memory_spread_slab # enable system wide

That's all you need to do to enable this system wide.

With this, tasks will be spreading these selected slab caches,
independently of whatever mempolicy they have.

To disable this memory spreading system wide:
echo 0 > /dev/cpuset/memory_spread_slab # disable system wide

If you want to control which tasks have these slab spread, then
it is just a few more lines once at boottime. The following
lines make a second cpuset 'spread_tasks'. Tasks in this second
cpuset will be spread; the other tasks in the root cpuset won't be
spread.

One time at boot:
test -d /dev/cpuset || mkdir /dev/cpuset
mount -t cpuset cpuset /dev/cpuset
mkdir /dev/cpuset/spread_tasks
cat /dev/cpuset/cpus > /dev/cpuset/spread_tasks/cpus
cat /dev/cpuset/mems > /dev/cpuset/spread_tasks/mems
echo 1 > /dev/cpuset/spread_tasks/memory_spread_slab

Then during operation, for each task $pid that is to be spread:
echo $pid > /dev/cpuset/spread_tasks/tasks # enable for $pid

or to disable that spreading for a pid:
echo $pid > /dev/cpuset/tasks # disable for $pid

These echo's can be done with open/write/close system calls if you
prefer.

The first two echos above would correspond to a sysctl that applied
system wide, enabling or disabling memory spreading on these slabs for
all tasks. The last two echo's correspond directly to a sysctl that
applies to a single specified pid.

Why do a new sysctl, when the existing open/write/close system calls
can do the same thing?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-01 18:34:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wednesday 01 March 2006 19:27, Paul Jackson wrote:
> > > 1) Are you content to have such a interleave of these particular file
> > > i/o slabs triggered by a mm/mempolicy.c option? Or do you think
> > > we need some sort of task external API to invoke this policy?
> >
> > Task external. mempolicy.c has no good way to handle multiple policies
> > like this. I was thinking of a simple sysctl
>
> No need to implement a sysctl for this. The current cpuset facility
> should provide just what you want, if I am understanding correctly.

The main reason i'm reluctant to use this is that the cpuset fast path
overhead (e.g. in memory allocators etc.) is quite large and I wouldn't like
to recommend people to enable all this overhead by default just to get
more useful dcache/inode behaviour on small NUMA systems.

-Andi

2006-03-01 18:39:05

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wed, 1 Mar 2006, Andi Kleen wrote:

> > No need to implement a sysctl for this. The current cpuset facility
> > should provide just what you want, if I am understanding correctly.
>
> The main reason i'm reluctant to use this is that the cpuset fast path
> overhead (e.g. in memory allocators etc.) is quite large and I wouldn't like
> to recommend people to enable all this overhead by default just to get
> more useful dcache/inode behaviour on small NUMA systems.

Is this a gut feeling or do you have some measurements to back that up?
Paul worked hard on making all the overhead in critical paths as light as
possible and from what I can see he did a very good job.



2006-03-01 18:58:56

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

Andi wrote:
> The main reason i'm reluctant to use this is that the cpuset fast path
> overhead (e.g. in memory allocators etc.) is quite large

I disagree.

I spent much time minimizing that overhead over the last few months, as
a direct result of your recommendation to do so.

Especially in the case that all tasks are in the root cpuset (as in the
scenario I just suggested for setting this memory spreading policy for
all tasks), the overhead is practically zero. The key hook is an
inline test done (usually) once per page allocation on an essentially
read only global 'number_of_cpusets' that determines it is <= 1.

I disagree with your "quite large" characterization.

Please explain further.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-01 19:28:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wednesday 01 March 2006 19:58, Paul Jackson wrote:
> Andi wrote:
> > The main reason i'm reluctant to use this is that the cpuset fast path
> > overhead (e.g. in memory allocators etc.) is quite large
>
> I disagree.
>
> I spent much time minimizing that overhead over the last few months, as
> a direct result of your recommendation to do so.

IIRC my recommendation only optimized the case of nobody using
cpuset if I remember correctly.

Using a single cpuset would already drop into the slow path, right?

Hmm, possibly it's better now, but I remember being shocked last
time I looked at the code in detail ow much code it executed for a normal
page allocation and how many cache lines it touched. This was some time
ago admittedly.

Also on a different angle I would like to make the dcache/inode spreading
basically default on x86-64 and I'm not sure I want to get into the business
of explaining all the distributions how to set up cpusets and set up
new file systems.
For that a single switch that can be just set by default is much more
practical.

>
> Especially in the case that all tasks are in the root cpuset (as in the
> scenario I just suggested for setting this memory spreading policy for
> all tasks), the overhead is practically zero.

Ok.

> The key hook is an
> inline test done (usually) once per page allocation on an essentially
> read only global 'number_of_cpusets' that determines it is <= 1.
>
> I disagree with your "quite large" characterization.

Agreed perhaps it was somewhat exaggerated.

-Andi

2006-03-01 20:54:12

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

> > I spent much time minimizing that overhead over the last few months, as
> > a direct result of your recommendation to do so.
>
> IIRC my recommendation only optimized the case of nobody using
> cpuset if I remember correctly.

As a result of your general concern with the performance impact
of cpusets on the page allocation code path, I optimized each
element of it, not just the one case covered by your specific
recommendation.

Take a look.

> Using a single cpuset would already drop into the slow path, right?

No - having a single cpuset is the fastest path. All tasks
are in that root cpuset in that case, and all nodes allowed.

> I'm not sure I want to get into the business
> of explaining all the distributions how to set up cpusets ..

Good grief - I already quoted the 3 lines of boottime init script it
would take - this can't require that much explaining, and your new
sysctl can't get by with much less:
test -d /dev/cpuset || mkdir /dev/cpuset
mount -t cpuset cpuset /dev/cpuset
echo 1 > /dev/cpuset/memory_spread_slab # enable system wide

> and set up new file systems.

That's a Linux source issue that matters a single time in the history
of each file system type supported by Linux. It is not a customer or
even distro issue.

And even from the perspective of maintaining Linux, this should be on
autopilot. Every file systems inode cache is marked, and if we do
nothing, as more file system types are invented for Linux, they will
predictably cut+paste the inode slab cache setup from an existing file
system, and "just get it right."

> For that a single switch that can be just set by default is much more
> practical.

Doing it via cpusets is also a single switch that is set by default.

It is just as practical; well more practical - it's already there.

===

Mind you, I don't have any profound objections to such a sysctl.

I just don't see that it serves any purpose, and I suspect that
misunderstandings of the performance impact of cpusets are the
primary source of motivation for such a sysctl.

I prefer to (1) set the record straight on cpusets, and (2) avoid
adding additional kernel mechanisms that are redundant.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-01 20:59:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wednesday 01 March 2006 21:53, Paul Jackson wrote:
> > > I spent much time minimizing that overhead over the last few months, as
> > > a direct result of your recommendation to do so.
> >
> > IIRC my recommendation only optimized the case of nobody using
> > cpuset if I remember correctly.
>
> As a result of your general concern with the performance impact
> of cpusets on the page allocation code path, I optimized each
> element of it, not just the one case covered by your specific
> recommendation.

Thanks for doing that work then.


>
> > Using a single cpuset would already drop into the slow path, right?
>
> No - having a single cpuset is the fastest path. All tasks
> are in that root cpuset in that case, and all nodes allowed.

Faster than no cpuset?

>
> > I'm not sure I want to get into the business
> > of explaining all the distributions how to set up cpusets ..
>
> Good grief - I already quoted the 3 lines of boottime init script it
> would take - this can't require that much explaining, and your new
> sysctl can't get by with much less:

It would just be on by default - no user space configuration needed.

> And even from the perspective of maintaining Linux, this should be on
> autopilot. Every file systems inode cache is marked, and if we do
> nothing, as more file system types are invented for Linux, they will
> predictably cut+paste the inode slab cache setup from an existing file
> system, and "just get it right."

If something is a good default it shouldn't need user space
configuration at all imho. Only the "weird" cases should.

> I just don't see that it serves any purpose, and I suspect that
> misunderstandings of the performance impact of cpusets are the
> primary source of motivation for such a sysctl.

No that was just one. The other was having good defaults
even on lightweight kernels.

-Andi

2006-03-01 21:19:20

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

> > No - having a single cpuset is the fastest path. All tasks
> > are in that root cpuset in that case, and all nodes allowed.
>
> Faster than no cpuset?

If CONFIG_CPUSET is enabled (which I thought was likely to become the
norm for most distros -- though you would know better than I if this is
likely) then:

There is no such case as "no cpuset" !!

The minimal, fastest, case is one root cpuset holding all tasks.


> If something is a good default it shouldn't need user space
> configuration at all imho. Only the "weird" cases should.

So are you just saying we got the default backwards?

Well ... I left the default for memory spreading these
inode slab caches as it was - not spread (preferring
node local).

I did that because I did not have the awareness that this default
should be changed for most systems. I tend to leave defaults as they
are, unless I have good reason to change them.

But for the SGI systems I care about, I'd prefer the default to be
spreading them.

If you think it would be better to change this default, now that the
mechanism is in place to do support spreading these slabs, then I could
certainly go along with that.

Then your systems would not have to do anything in user space, unless
they wanted to disable spreading these slabs (which of course they
could easily do using cpusets ;).

Should we change the default to enable this spreading?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-01 21:21:44

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wednesday 01 March 2006 22:19, Paul Jackson wrote:

>
> > If something is a good default it shouldn't need user space
> > configuration at all imho. Only the "weird" cases should.
>
> So are you just saying we got the default backwards?

Yes.

> But for the SGI systems I care about, I'd prefer the default to be
> spreading them.

I think it's the best default for smaller systems too. I've had people
complaining about node inbalances that were caused by one or two
being filled up with d/icache. And the small latencies of accessing
them don't matter very much.

> If you think it would be better to change this default, now that the
> mechanism is in place to do support spreading these slabs, then I could
> certainly go along with that.

Yes that would make me happy.

> Then your systems would not have to do anything in user space, unless
> they wanted to disable spreading these slabs (which of course they
> could easily do using cpusets ;).
>
> Should we change the default to enable this spreading?

I would be in favour of it

-Andi

2006-03-01 22:20:20

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wed, 1 Mar 2006, Andi Kleen wrote:

> I think it's the best default for smaller systems too. I've had people
> complaining about node inbalances that were caused by one or two
> being filled up with d/icache. And the small latencies of accessing
> them don't matter very much.

But these are special situations. Placing memory on a distance node
is not beneficial in the standard case of a single threaded process
churning along opening and closing files etc.

Interleave is only beneficial for special applications that use a common
pool of data and that implement no other means of locality control. At
that point we sacrifice the performance benefit that comes with node locality
in order not to overload a single node.

Kernels before 2.6.16 suffer from special overload situations that are due
to not having the ability to reclaim the pagecache and the slab cache.
This is going to change in SLES10.

> > If you think it would be better to change this default, now that the
> > mechanism is in place to do support spreading these slabs, then I could
> > certainly go along with that.
>
> Yes that would make me happy.

It seems that we are trying to sacrifice the performance of many in
order to accomodate a few special situations. Cpusets are ideal for those
situations since they allow the localization of the interleaving to a
slice of the machine. Processes on the rest of the box can still get node
local memory and run at optimal performance.

> I would be in favour of it

Please run performance tests with single threaded processes if you
do not believe me before doing any of this.

2006-03-01 22:52:22

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

Well, as to what should be the default for this, enabling or not
enabling memory spreading on these inode and similar such slab caches,
I will have to leave that up to others more expert than I.

I am content to suggest that the default should reflect either the
current default (no spreading, as such was not an option until now) or
what is best for the majority of systems (whatever that is).

Probably best not to change the default, unless there is a clear
concensus that it's wrong in the majority of cases.

Whatever systems don't like the default can change it easily enough,
one way or the other.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-03-02 01:57:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Wednesday 01 March 2006 23:20, Christoph Lameter wrote:

> Interleave is only beneficial for special applications that use a common
> pool of data and that implement no other means of locality control. At
> that point we sacrifice the performance benefit that comes with node locality
> in order not to overload a single node.

My rationale is the locality is only important for cache lines that
are very frequently accessed. By far the most frequently accessed
items is user space data, system calls are still relatively rare
compare to that and system calls that touch files even less
so

(often in my measurements 40% and more of all syscalls are gettimeofday
actually)

Also we don't have very good balancing control on dcaches.

The problem is when one node runs updatedb or similar it will
end up allocating a lot of these objects, and then later when
a process ends up on that node it can have trouble allocating
node local memory (and a user process missing local memory is
typically much worse than a kernel object)

I guess your remote claim changes will help, but I'm not
convinced they are the best solution.

[hmm, actually didn't we discuss this once at length anyways.
Apparently I failed to convince you back then @:]

>
> Kernels before 2.6.16 suffer from special overload situations that are due
> to not having the ability to reclaim the pagecache and the slab cache.

Reclaiming is slow. Better not dig into this hole in the first place.

> > I would be in favour of it
>
> Please run performance tests with single threaded processes if you
> do not believe me before doing any of this.

Sure. But the motivation is less the single thread performance
anyways, but more the degradation under extreme loads.

-Andi

2006-03-02 14:38:37

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 01/02] cpuset memory spread slab cache filesys

On Thu, 2 Mar 2006, Andi Kleen wrote:

> (often in my measurements 40% and more of all syscalls are gettimeofday
> actually)

Yes that is why we excessively optimized gettimeofday in asm on ia64...

> Also we don't have very good balancing control on dcaches.

Right but I hope we will get there if we can get the zoned vm statistics
in and rework the VM a bit.

> > Please run performance tests with single threaded processes if you
> > do not believe me before doing any of this.
>
> Sure. But the motivation is less the single thread performance
> anyways, but more the degradation under extreme loads.

The extreme loads may benefit from interleave. But note that the
performance gains in the NUMA slab allocator came from exploiting
locality. The support of policies and other off node memory accesses in
the SLAB allocator is an afterthought. Only node local accesses can be
served from per cpu caches with a simple interrupt on / off. Off node
accesses generated by policies etc will require locking and working with
remote memory structures.

If these are indeed rare user space accesses that require slab elements
then these performance issues do not matter and you may be able to spread
without performance penalties. However, our experience with the slab
allocator was that intensive workloads can be influenced by slab
performance.