2005-12-01 13:17:39

by Dirk Henning Gerdes

[permalink] [raw]
Subject: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

Hi Jens!

For doing benchmarks on the I/O-Schedulers, I thought it would be very
useful to disable the pagecache.

I didn't want to make it so complicated so I just mark pages as
not-uptodate, so they have to be read again. Another reason was, that I
wanted to keep the conditions as near to reality as possible.

Further I thought it would be useful, if you could turn the pagecache on
and off without rebooting the system.

I implemented a proc-fs entry "/proc/benchmark/pagecache" for this.

Probably this patch can be useful for anyone else, who wants to do some
benchmarks on block-layer stuff.
And if not, I would appreciate if you could have a look on it.

Signed-off-by: Dirk Gerdes <[email protected]>




2005-12-01 13:29:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, 2005-12-01 at 14:17 +0100, Dirk Henning Gerdes wrote:
> Hi Jens!
>
> For doing benchmarks on the I/O-Schedulers, I thought it would be very
> useful to disable the pagecache.


for benchmarks this is not enough though, you also need to clean the
inode and dentry caches, as well as any filesystem specific caches
(might be buffer cache).....
at which point it's probably nicer to just fake a limited umount since
that has to do all of that anyway

2005-12-01 13:43:50

by Dirk Henning Gerdes

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

Probably I should have mentioned, how my benchmark should look like:

I have written a little c-program opening several files for reading and
writing.
The dentry-cache would only play a role the first time, the files are
opened. I'm not quite sure about the inode-cache.
I check if the page has buffer, and mark them as not uptodate, too. So
the buffer-cache is disabled, too.

I'm using ext2/ext3. I don't think, they use any additional caches.

But anyway: Could you explain your fake-umount idea a little more ?

Am Donnerstag, den 01.12.2005, 14:29 +0100 schrieb Arjan van de Ven:
> On Thu, 2005-12-01 at 14:17 +0100, Dirk Henning Gerdes wrote:
> > Hi Jens!
> >
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
>
> for benchmarks this is not enough though, you also need to clean the
> inode and dentry caches, as well as any filesystem specific caches
> (might be buffer cache).....
> at which point it's probably nicer to just fake a limited umount since
> that has to do all of that anyway
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Dirk Henning Gerdes
B?nnersdyk 47
47803 Krefeld

Tel: 02151-755745
0174-7776640
Mail: [email protected]

2005-12-01 14:35:32

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, Dec 01 2005, Dirk Henning Gerdes wrote:
> Hi Jens!
>
> For doing benchmarks on the I/O-Schedulers, I thought it would be very
> useful to disable the pagecache.
>
> I didn't want to make it so complicated so I just mark pages as
> not-uptodate, so they have to be read again. Another reason was, that I
> wanted to keep the conditions as near to reality as possible.
>
> Further I thought it would be useful, if you could turn the pagecache on
> and off without rebooting the system.
>
> I implemented a proc-fs entry "/proc/benchmark/pagecache" for this.
>
> Probably this patch can be useful for anyone else, who wants to do some
> benchmarks on block-layer stuff.
> And if not, I would appreciate if you could have a look on it.

This is rather odd, if you ask me, I don't like it. If you are doing
serious benchmarking, you do it on a seperate disk / file system which
you can just umount/mount before starting over. Or you reboot the
machine in between.

--
Jens Axboe

2005-12-01 22:46:42

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

Dirk Henning Gerdes <[email protected]> wrote:

> For doing benchmarks on the I/O-Schedulers, I thought it would be very
> useful to disable the pagecache.
>
> I didn't want to make it so complicated so I just mark pages as
> not-uptodate, so they have to be read again. Another reason was, that I
> wanted to keep the conditions as near to reality as possible.
>
> Further I thought it would be useful, if you could turn the pagecache on
> and off without rebooting the system.
>
> I implemented a proc-fs entry "/proc/benchmark/pagecache" for this.

1) This mail is the only documentation on how to operate your patch.
How do you suppose your users to find out how to operate the switch?
(I asume it's really a switch, a toggle would be insane.)

Since it's very short and only for special purpose, documenting it
in Kconfig mignt be enough.

2) You're seperating your patches by file, not by function. ungood.

3) Your patches introduce a lot of whitespace.
--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.

2005-12-02 01:25:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

Dirk Henning Gerdes <[email protected]> wrote:
>
> For doing benchmarks on the I/O-Schedulers, I thought it would be very
> useful to disable the pagecache.

That's an FAQ. Something like this?


From: Andrew Morton <[email protected]>

Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
to discard as much pagecache and reclaimable slab objects as it can.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.

c) The pagecache shrinking and slab shrinking should probably have separate
controls.


Signed-off-by: Andrew Morton <[email protected]>
---

fs/Makefile | 2 -
fs/drop-pagecache.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/mm.h | 5 +++
include/linux/sysctl.h | 1
kernel/sysctl.c | 9 +++++++
mm/truncate.c | 1
mm/vmscan.c | 3 --
7 files changed, 79 insertions(+), 4 deletions(-)

diff -puN /dev/null fs/drop-pagecache.c
--- /dev/null 2003-09-15 06:40:47.000000000 -0700
+++ devel-akpm/fs/drop-pagecache.c 2005-12-01 17:20:55.000000000 -0800
@@ -0,0 +1,62 @@
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/writeback.h>
+#include <linux/sysctl.h>
+#include <linux/gfp.h>
+
+static void drop_pagecache_sb(struct super_block *sb)
+{
+ struct inode *inode;
+
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ if (inode->i_state & (I_FREEING|I_WILL_FREE))
+ continue;
+ invalidate_inode_pages(inode->i_mapping);
+ }
+ spin_unlock(&inode_lock);
+}
+
+static void drop_pagecache(void)
+{
+ struct super_block *sb;
+
+ spin_lock(&sb_lock);
+restart:
+ list_for_each_entry(sb, &super_blocks, s_list) {
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+ down_read(&sb->s_umount);
+ if (sb->s_root)
+ drop_pagecache_sb(sb);
+ up_read(&sb->s_umount);
+ spin_lock(&sb_lock);
+ if (__put_super_and_need_restart(sb))
+ goto restart;
+ }
+ spin_unlock(&sb_lock);
+ printk("shrunk pagecache\n");
+}
+
+static void drop_slab(void)
+{
+ int nr_objects;
+
+ do {
+ nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+ printk("shrunk %d cache objects\n", nr_objects);
+ } while (nr_objects > 10);
+}
+
+int drop_pagecache_sysctl_handler(ctl_table *table, int write,
+ struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
+{
+ drop_pagecache();
+ drop_slab();
+ return 0;
+}
diff -puN fs/Makefile~drop-pagecache fs/Makefile
--- devel/fs/Makefile~drop-pagecache 2005-12-01 16:41:22.000000000 -0800
+++ devel-akpm/fs/Makefile 2005-12-01 16:41:22.000000000 -0800
@@ -10,7 +10,7 @@ obj-y := open.o read_write.o file_table.
ioctl.o readdir.o select.o fifo.o locks.o dcache.o inode.o \
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
- ioprio.o pnode.o
+ ioprio.o pnode.o drop-pagecache.o

obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_EPOLL) += eventpoll.o
diff -puN include/linux/mm.h~drop-pagecache include/linux/mm.h
--- devel/include/linux/mm.h~drop-pagecache 2005-12-01 16:41:22.000000000 -0800
+++ devel-akpm/include/linux/mm.h 2005-12-01 17:01:57.000000000 -0800
@@ -1078,5 +1078,10 @@ int in_gate_area_no_task(unsigned long a
/* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */
#define OOM_DISABLE -17

+int drop_pagecache_sysctl_handler(struct ctl_table *, int, struct file *,
+ void __user *, size_t *, loff_t *);
+int shrink_slab(unsigned long scanned, gfp_t gfp_mask,
+ unsigned long lru_pages);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff -puN include/linux/sysctl.h~drop-pagecache include/linux/sysctl.h
--- devel/include/linux/sysctl.h~drop-pagecache 2005-12-01 16:41:22.000000000 -0800
+++ devel-akpm/include/linux/sysctl.h 2005-12-01 16:41:22.000000000 -0800
@@ -182,6 +182,7 @@ enum
VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space layout */
VM_SWAP_TOKEN_TIMEOUT=28, /* default time for token time out */
VM_SWAP_PREFETCH=29, /* int: amount to swap prefetch */
+ VM_DROP_PAGECACHE=30, /* int: nuke lots of pagecache */
};


diff -puN kernel/sysctl.c~drop-pagecache kernel/sysctl.c
--- devel/kernel/sysctl.c~drop-pagecache 2005-12-01 16:41:22.000000000 -0800
+++ devel-akpm/kernel/sysctl.c 2005-12-01 16:41:22.000000000 -0800
@@ -783,6 +783,15 @@ static ctl_table vm_table[] = {
.strategy = &sysctl_intvec,
},
{
+ .ctl_name = VM_DROP_PAGECACHE,
+ .procname = "drop-pagecache",
+ .data = NULL,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = drop_pagecache_sysctl_handler,
+ .strategy = &sysctl_intvec,
+ },
+ {
.ctl_name = VM_MIN_FREE_KBYTES,
.procname = "min_free_kbytes",
.data = &min_free_kbytes,
diff -puN mm/truncate.c~drop-pagecache mm/truncate.c
--- devel/mm/truncate.c~drop-pagecache 2005-12-01 16:49:06.000000000 -0800
+++ devel-akpm/mm/truncate.c 2005-12-01 16:49:13.000000000 -0800
@@ -256,7 +256,6 @@ unlock:
break;
}
pagevec_release(&pvec);
- cond_resched();
}
return ret;
}
diff -puN mm/vmscan.c~drop-pagecache mm/vmscan.c
--- devel/mm/vmscan.c~drop-pagecache 2005-12-01 16:58:30.000000000 -0800
+++ devel-akpm/mm/vmscan.c 2005-12-01 17:00:39.000000000 -0800
@@ -181,8 +181,7 @@ EXPORT_SYMBOL(remove_shrinker);
*
* Returns the number of slab objects which we shrunk.
*/
-static int shrink_slab(unsigned long scanned, gfp_t gfp_mask,
- unsigned long lru_pages)
+int shrink_slab(unsigned long scanned, gfp_t gfp_mask, unsigned long lru_pages)
{
struct shrinker *shrinker;
int ret = 0;
_

2005-12-02 01:34:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, Dec 01, 2005 at 05:25:20PM -0800, Andrew Morton wrote:
> Dirk Henning Gerdes <[email protected]> wrote:
> >
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
> That's an FAQ. Something like this?
>
>
> From: Andrew Morton <[email protected]>
>
> Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
> to discard as much pagecache and reclaimable slab objects as it can.
>
> It won't drop dirty data, so the user should run `sync' first.
>
> Caveats:
>
> a) Holds inode_lock for exorbitant amounts of time.
>
> b) Needs to be taught about NUMA nodes: propagate these all the way through
> so the discarding can be controlled on a per-node basis.
>
> c) The pagecache shrinking and slab shrinking should probably have separate
> controls.
>
>
> Signed-off-by: Andrew Morton <[email protected]>

ACK, I've wanted something like this for a while.

I really think it should be a config option, though, to discourage
people from building with it :)

Jeff



2005-12-02 19:17:28

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, 2005-12-01 at 17:25 -0800, Andrew Morton wrote:
> Dirk Henning Gerdes <[email protected]> wrote:
> >
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
> That's an FAQ. Something like this?
>
>
> From: Andrew Morton <[email protected]>
>
> Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
> to discard as much pagecache and reclaimable slab objects as it can.
>
> It won't drop dirty data, so the user should run `sync' first.
>
> Caveats:
>
> a) Holds inode_lock for exorbitant amounts of time.
>
> b) Needs to be taught about NUMA nodes: propagate these all the way through
> so the discarding can be controlled on a per-node basis.
>
> c) The pagecache shrinking and slab shrinking should probably have separate
> controls.
>
>
> Signed-off-by: Andrew Morton <[email protected]>

Yep. This is what I wanted also :) This is similar functionality as
"cfree" module some one wrote a while ago.

Cool, This will make some of the database folks get off my back for a
while :)


Thanks,
Badari

2005-12-02 19:19:14

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, 2005-12-01 at 20:34 -0500, Jeff Garzik wrote:
> On Thu, Dec 01, 2005 at 05:25:20PM -0800, Andrew Morton wrote:
> > Dirk Henning Gerdes <[email protected]> wrote:
> > >
> > > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > > useful to disable the pagecache.
> >
> > That's an FAQ. Something like this?
> >
> >
> > From: Andrew Morton <[email protected]>
> >
> > Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
> > to discard as much pagecache and reclaimable slab objects as it can.
> >
> > It won't drop dirty data, so the user should run `sync' first.
> >
> > Caveats:
> >
> > a) Holds inode_lock for exorbitant amounts of time.
> >
> > b) Needs to be taught about NUMA nodes: propagate these all the way through
> > so the discarding can be controlled on a per-node basis.
> >
> > c) The pagecache shrinking and slab shrinking should probably have separate
> > controls.
> >
> >
> > Signed-off-by: Andrew Morton <[email protected]>
>
> ACK, I've wanted something like this for a while.
>
> I really think it should be a config option, though, to discourage
> people from building with it :)

Why ? Since its controlled through /proc, if some one "echo" stuff into
it, they might get crappy performance (like other /proc tunables).
Isn't it expected ?

Thanks,
Badari

2005-12-02 21:24:42

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, 2005-12-01 at 17:25 -0800, Andrew Morton wrote:
> Dirk Henning Gerdes <[email protected]> wrote:
> >
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
> That's an FAQ. Something like this?
>
>
> From: Andrew Morton <[email protected]>
>
> Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
> to discard as much pagecache and reclaimable slab objects as it can.
>
> It won't drop dirty data, so the user should run `sync' first.
>
> Caveats:
>
> a) Holds inode_lock for exorbitant amounts of time.
>
> b) Needs to be taught about NUMA nodes: propagate these all the way through
> so the discarding can be controlled on a per-node basis.
>
> c) The pagecache shrinking and slab shrinking should probably have separate
> controls.
>
>
> Signed-off-by: Andrew Morton <[email protected]>

Wondering, if this shrinks shared memory pages (since they are backed by
tmpfs) ? (which is not what I want).

Thanks,
Badari

2005-12-02 21:43:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

Badari Pulavarty <[email protected]> wrote:
>
> Wondering, if this shrinks shared memory pages (since they are backed by
> tmpfs) ? (which is not what I want).

It'll reclaim unused pagecache pages. What effect that has on
idioticfs^Wtmpfs pages depends on the state of the pages. If they're
attached to tmpfs inodes then they won't be reclaimed because they have no
backing store. If they're attached to swapcache then they won't be
reclaimed because they have no superblock.

So I guess you got lucky.

2005-12-02 22:33:37

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Fri, 2005-12-02 at 13:44 -0800, Andrew Morton wrote:
> Badari Pulavarty <[email protected]> wrote:
> >
> > Wondering, if this shrinks shared memory pages (since they are backed by
> > tmpfs) ? (which is not what I want).
>
> It'll reclaim unused pagecache pages. What effect that has on
> idioticfs^Wtmpfs pages depends on the state of the pages. If they're
> attached to tmpfs inodes then they won't be reclaimed because they have no
> backing store. If they're attached to swapcache then they won't be
> reclaimed because they have no superblock.
>
> So I guess you got lucky.

Wow !! Thank you. Its not that often, I get lucky :)

Thanks,
Badari

2005-12-05 16:02:12

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thursday 01 December 2005 19:25, Andrew Morton wrote:
> Dirk Henning Gerdes <[email protected]> wrote:
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
> That's an FAQ. Something like this?
>
>
> From: Andrew Morton <[email protected]>
>
> Add /proc/sys/vm/drop-pagecache. When written to, this will cause the
> kernel to discard as much pagecache and reclaimable slab objects as it can.
>
> It won't drop dirty data, so the user should run `sync' first.

This is deeply, deeply cool.

> Caveats:
>
> a) Holds inode_lock for exorbitant amounts of time.

Voluntary preemption point, maybe?

> b) Needs to be taught about NUMA nodes: propagate these all the way through
> so the discarding can be controlled on a per-node basis.
>
> c) The pagecache shrinking and slab shrinking should probably have separate
> controls.

It could care about _what_ you write to it, maybe? (The first byte,
anyway...)

>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> fs/Makefile | 2 -
> fs/drop-pagecache.c | 62
> +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h |
> 5 +++
> include/linux/sysctl.h | 1
> kernel/sysctl.c | 9 +++++++
> mm/truncate.c | 1
> mm/vmscan.c | 3 --
> 7 files changed, 79 insertions(+), 4 deletions(-)
>
> diff -puN /dev/null fs/drop-pagecache.c
> --- /dev/null 2003-09-15 06:40:47.000000000 -0700
> +++ devel-akpm/fs/drop-pagecache.c 2005-12-01 17:20:55.000000000 -0800
> @@ -0,0 +1,62 @@
> +/*
> + * Implement the manual drop-all-pagecache function
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/fs.h>
> +#include <linux/writeback.h>
> +#include <linux/sysctl.h>
> +#include <linux/gfp.h>
> +
> +static void drop_pagecache_sb(struct super_block *sb)
> +{
> + struct inode *inode;
> +
> + spin_lock(&inode_lock);
> + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> + if (inode->i_state & (I_FREEING|I_WILL_FREE))
> + continue;
> + invalidate_inode_pages(inode->i_mapping);
> + }
> + spin_unlock(&inode_lock);
> +}
> +
> +static void drop_pagecache(void)
> +{
> + struct super_block *sb;
> +
> + spin_lock(&sb_lock);
> +restart:
> + list_for_each_entry(sb, &super_blocks, s_list) {
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> + down_read(&sb->s_umount);
> + if (sb->s_root)
> + drop_pagecache_sb(sb);
> + up_read(&sb->s_umount);
> + spin_lock(&sb_lock);
> + if (__put_super_and_need_restart(sb))
> + goto restart;
> + }
> + spin_unlock(&sb_lock);
> + printk("shrunk pagecache\n");
> +}
> +
> +static void drop_slab(void)
> +{
> + int nr_objects;
> +
> + do {
> + nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
> + printk("shrunk %d cache objects\n", nr_objects);
> + } while (nr_objects > 10);
> +}
> +
> +int drop_pagecache_sysctl_handler(ctl_table *table, int write,
> + struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
> +{
> + drop_pagecache();
> + drop_slab();
> + return 0;
> +}
> diff -puN fs/Makefile~drop-pagecache fs/Makefile
> --- devel/fs/Makefile~drop-pagecache 2005-12-01 16:41:22.000000000 -0800
> +++ devel-akpm/fs/Makefile 2005-12-01 16:41:22.000000000 -0800
> @@ -10,7 +10,7 @@ obj-y := open.o read_write.o file_table.
> ioctl.o readdir.o select.o fifo.o locks.o dcache.o inode.o \
> attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
> seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
> - ioprio.o pnode.o
> + ioprio.o pnode.o drop-pagecache.o
>
> obj-$(CONFIG_INOTIFY) += inotify.o
> obj-$(CONFIG_EPOLL) += eventpoll.o
> diff -puN include/linux/mm.h~drop-pagecache include/linux/mm.h
> --- devel/include/linux/mm.h~drop-pagecache 2005-12-01 16:41:22.000000000
> -0800 +++ devel-akpm/include/linux/mm.h 2005-12-01 17:01:57.000000000 -0800
> @@ -1078,5 +1078,10 @@ int in_gate_area_no_task(unsigned long a
> /* /proc/<pid>/oom_adj set to -17 protects from the oom-killer */
> #define OOM_DISABLE -17
>
> +int drop_pagecache_sysctl_handler(struct ctl_table *, int, struct file *,
> + void __user *, size_t *, loff_t *);
> +int shrink_slab(unsigned long scanned, gfp_t gfp_mask,
> + unsigned long lru_pages);
> +
> #endif /* __KERNEL__ */
> #endif /* _LINUX_MM_H */
> diff -puN include/linux/sysctl.h~drop-pagecache include/linux/sysctl.h
> --- devel/include/linux/sysctl.h~drop-pagecache 2005-12-01
> 16:41:22.000000000 -0800 +++ devel-akpm/include/linux/sysctl.h 2005-12-01
> 16:41:22.000000000 -0800 @@ -182,6 +182,7 @@ enum
> VM_LEGACY_VA_LAYOUT=27, /* legacy/compatibility virtual address space
> layout */ VM_SWAP_TOKEN_TIMEOUT=28, /* default time for token time out */
> VM_SWAP_PREFETCH=29, /* int: amount to swap prefetch */
> + VM_DROP_PAGECACHE=30, /* int: nuke lots of pagecache */
> };
>
>
> diff -puN kernel/sysctl.c~drop-pagecache kernel/sysctl.c
> --- devel/kernel/sysctl.c~drop-pagecache 2005-12-01 16:41:22.000000000
> -0800 +++ devel-akpm/kernel/sysctl.c 2005-12-01 16:41:22.000000000 -0800 @@
> -783,6 +783,15 @@ static ctl_table vm_table[] = {
> .strategy = &sysctl_intvec,
> },
> {
> + .ctl_name = VM_DROP_PAGECACHE,
> + .procname = "drop-pagecache",
> + .data = NULL,
> + .maxlen = sizeof(int),
> + .mode = 0644,

So what _does_ it do when you read from it?

> + .proc_handler = drop_pagecache_sysctl_handler,
> + .strategy = &sysctl_intvec,
> + },
> + {
> .ctl_name = VM_MIN_FREE_KBYTES,
> .procname = "min_free_kbytes",
> .data = &min_free_kbytes,
> diff -puN mm/truncate.c~drop-pagecache mm/truncate.c
> --- devel/mm/truncate.c~drop-pagecache 2005-12-01 16:49:06.000000000 -0800
> +++ devel-akpm/mm/truncate.c 2005-12-01 16:49:13.000000000 -0800
> @@ -256,7 +256,6 @@ unlock:
> break;
> }
> pagevec_release(&pvec);
> - cond_resched();

Why drop that line? (I don't follow...)

Rob
--
Steve Ballmer: Innovation! Inigo Montoya: You keep using that word.
I do not think it means what you think it means.

2005-12-05 16:20:36

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Sun, 2005-12-04 at 20:13 -0600, Rob Landley wrote:
> > Add /proc/sys/vm/drop-pagecache. When written to, this will cause the
> > kernel to discard as much pagecache and reclaimable slab objects as it can.
> >
> > It won't drop dirty data, so the user should run `sync' first.
>
> This is deeply, deeply cool.
>
> > Caveats:
> >
> > a) Holds inode_lock for exorbitant amounts of time.
>
> Voluntary preemption point, maybe?

I thin it's a bad idea, that would just encourage people to use this for
anything other than debugging. If you care about latency don't discard
the page cache.

The GNOME people have been asking for this for a while, in order to
improve startup times, they would like a way to simulate a cold start
without rebooting.

Lee

2005-12-05 16:54:05

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Thu, 2005-12-01 at 17:25 -0800, Andrew Morton wrote:
> Dirk Henning Gerdes <[email protected]> wrote:
> >
> > For doing benchmarks on the I/O-Schedulers, I thought it would be very
> > useful to disable the pagecache.
>
> That's an FAQ. Something like this?
>
>
> From: Andrew Morton <[email protected]>
>
> Add /proc/sys/vm/drop-pagecache. When written to, this will cause the kernel
> to discard as much pagecache and reclaimable slab objects as it can.
>
> It won't drop dirty data, so the user should run `sync' first.

BTW, (a while ago) I tried doing similar thing from user-space
using POSIX_FADV_DONTNEED on a file. While it worked great to
get rid of the pagecache pages for few files, since I had to
run this on each and every file in the filesystem - it ended
up bloating inode, dentry slabs :( I really wanted to find out
what files are really cached in the pagecache to run this on.

Thanks,
Badari

2005-12-05 17:29:04

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH 0/4] linux-2.6-block: deactivating pagecache for benchmarks

On Monday 05 December 2005 10:20, Lee Revell wrote:

> > > Caveats:
> > >
> > > a) Holds inode_lock for exorbitant amounts of time.
> >
> > Voluntary preemption point, maybe?
>
> I thin it's a bad idea, that would just encourage people to use this for
> anything other than debugging. If you care about latency don't discard
> the page cache.
>
> The GNOME people have been asking for this for a while, in order to
> improve startup times, they would like a way to simulate a cold start
> without rebooting.

I was thinking that virtual environments (namely, User Mode Linux) could use
this in conjunction with sys_punch to free up memory back to the host system.

> Lee

Rob
--
Steve Ballmer: Innovation! Inigo Montoya: You keep using that word.
I do not think it means what you think it means.