2014-06-24 16:15:14

by Maksym Planeta

[permalink] [raw]
Subject: [PATCH] sysctl: Add a feature to drop caches selectively

To clean the page cache one can use /proc/sys/vm/drop_caches. But this
drops the whole page cache. In contrast to that sdrop_caches enables
ability to drop the page cache selectively by path string.

Suggested-by: Thomas Knauth <[email protected]>
Signed-off-by: Maksym Planeta <[email protected]>
---
Documentation/sysctl/vm.txt | 15 ++++++
fs/Makefile | 2 +-
fs/sdrop_caches.c | 124 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 fs/sdrop_caches.c

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index bd4b34c..faad01d 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
- dirty_ratio
- dirty_writeback_centisecs
- drop_caches
+- sdrop_caches
- extfrag_threshold
- hugepages_treat_as_movable
- hugetlb_shm_group
@@ -211,6 +212,20 @@ with your system. To disable them, echo 4 (bit 3) into drop_caches.

==============================================================

+sdrop_caches
+
+Writing to this will cause the kernel to drop clean caches starting from
+specified path.
+
+To free pagecache of a file:
+ echo /home/user/file > /proc/sys/vm/sdrop_caches
+To free pagecache of a directory and all files in it.
+ echo /home/user/directly > /proc/sys/vm/sdrop_caches
+
+Restrictions are the same as for drop_caches.
+
+==============================================================
+
extfrag_threshold

This parameter affects whether the kernel will compact memory or direct
diff --git a/fs/Makefile b/fs/Makefile
index 4030cbf..366c7b9 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -44,7 +44,7 @@ obj-$(CONFIG_FS_MBCACHE) += mbcache.o
obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o
obj-$(CONFIG_NFS_COMMON) += nfs_common/
obj-$(CONFIG_COREDUMP) += coredump.o
-obj-$(CONFIG_SYSCTL) += drop_caches.o
+obj-$(CONFIG_SYSCTL) += drop_caches.o sdrop_caches.o

obj-$(CONFIG_FHANDLE) += fhandle.o

diff --git a/fs/sdrop_caches.c b/fs/sdrop_caches.c
new file mode 100644
index 0000000..c193655
--- /dev/null
+++ b/fs/sdrop_caches.c
@@ -0,0 +1,124 @@
+/*
+ * Implement the manual selective drop pagecache function
+ */
+
+#include <linux/module.h>
+
+
+#include <linux/kernel.h>
+#include <linux/proc_fs.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/writeback.h>
+#include <linux/sysctl.h>
+#include <linux/gfp.h>
+#include <linux/limits.h>
+#include <linux/namei.h>
+
+static void clean_mapping(struct dentry *dentry)
+{
+ struct inode *inode = dentry->d_inode;
+
+ if (!inode)
+ return;
+
+ if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+ (inode->i_mapping->nrpages == 0)) {
+ return;
+ }
+
+ invalidate_mapping_pages(inode->i_mapping, 0, -1);
+}
+
+static void clean_all_dentries_locked(struct dentry *dentry)
+{
+ struct dentry *child;
+
+ list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+ clean_all_dentries_locked(child);
+ }
+
+ clean_mapping(dentry);
+}
+
+static void clean_all_dentries(struct dentry *dentry)
+{
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ clean_all_dentries_locked(dentry);
+ spin_unlock(&dentry->d_lock);
+}
+
+static int drop_pagecache(const char * __user filename)
+{
+ unsigned int lookup_flags = LOOKUP_FOLLOW;
+ struct path path;
+ int error;
+
+retry:
+ error = user_path_at(AT_FDCWD, filename, lookup_flags, &path);
+ if (!error) {
+ /* clean */
+ clean_all_dentries(path.dentry);
+ }
+ if (retry_estale(error, lookup_flags)) {
+ lookup_flags |= LOOKUP_REVAL;
+ goto retry;
+ }
+ return error;
+}
+
+static int sdrop_ctl_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ char __user *pathname = buffer + *lenp - 1;
+
+ put_user('\0', pathname);
+
+ if (!write)
+ return 0;
+
+ return drop_pagecache(buffer);
+}
+
+static struct ctl_path vm_path[] = { { .procname = "vm", }, { } };
+static struct ctl_table sdrop_ctl_table[] = {
+ {
+ .procname = "sdrop_caches",
+ .mode = 0644,
+ .proc_handler = sdrop_ctl_handler,
+ },
+ { }
+};
+
+static struct ctl_table_header *sdrop_proc_entry;
+
+/* Init function called on module entry */
+int sdrop_init(void)
+{
+ int ret = 0;
+
+ sdrop_proc_entry = register_sysctl_paths(vm_path, sdrop_ctl_table);
+
+ if (sdrop_proc_entry == NULL) {
+ ret = -ENOMEM;
+ pr_err("sdrop_caches: Couldn't create proc entry\n");
+ }
+
+ return ret;
+}
+
+/* Cleanup function called on module exit */
+void sdrop_cleanup(void)
+{
+ unregister_sysctl_table(sdrop_proc_entry);
+}
+
+module_init(sdrop_init);
+module_exit(sdrop_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Selective pagecache drop module");
+MODULE_AUTHOR("Maksym Planeta");
--
2.0.0


2014-06-24 21:59:41

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Tue, 24 Jun 2014, Maksym Planeta wrote:

> To clean the page cache one can use /proc/sys/vm/drop_caches. But this
> drops the whole page cache. In contrast to that sdrop_caches enables
> ability to drop the page cache selectively by path string.
>
> Suggested-by: Thomas Knauth <[email protected]>
> Signed-off-by: Maksym Planeta <[email protected]>

Could you include some information in the commit message about why this is
useful? Specifically, why you want to drop pagecache only from a specific
path.

The name of the sysctl is also quite non-descriptive.

2014-06-25 06:25:50

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Tue, 2014-06-24 at 14:59 -0700, David Rientjes wrote:
> On Tue, 24 Jun 2014, Maksym Planeta wrote:
>
> > To clean the page cache one can use /proc/sys/vm/drop_caches. But this
> > drops the whole page cache. In contrast to that sdrop_caches enables
> > ability to drop the page cache selectively by path string.
> >
> > Suggested-by: Thomas Knauth <[email protected]>
> > Signed-off-by: Maksym Planeta <[email protected]>
>
> Could you include some information in the commit message about why this is
> useful? Specifically, why you want to drop pagecache only from a specific
> path.
>
> The name of the sysctl is also quite non-descriptive.

Plus some explanations WRT why proc-based interface and what would be
the alternatives, what if tomorrow we want to extend the functionality
and drop caches only for certain file range, is this only for regular
files or also for directories, why posix_fadvice(DONTNEED) is not
sufficient.

--
Best Regards,
Artem Bityutskiy

2014-06-25 08:25:11

by Thomas Knauth

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
> Plus some explanations WRT why proc-based interface and what would be
> the alternatives, what if tomorrow we want to extend the functionality
> and drop caches only for certain file range, is this only for regular
> files or also for directories, why posix_fadvice(DONTNEED) is not
> sufficient.

I suggested the idea originally. Let me address each of your questions in turn:

Why a selective drop? To have a middle ground between echo 2 >
drop_caches and echo 3 > drop_caches. When is this interesting? My
particular use case was benchmarking. I wanted to repeatedly measure
the timing when things were read from disk. Dropping everything from
the cache, also drops useful things, not just the few files your
benchmark intends to measure.

Why /proc? Because this is where the current drop_caches mechanism is
located. If it should go somewhere else, please do suggest so.

The string is a path, i.e., can be either a file or a directory. In
case of a directory, we recursively drop all its contents.

Why not use posix_fadvice()? Because it is exactly this, an advice.
The kernel is free to do whatever, i.e., also ignore the request. We
want a mechanism that reliably drops select content from the cache.

2014-06-25 09:56:38

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, 2014-06-25 at 10:25 +0200, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
> > Plus some explanations WRT why proc-based interface and what would be
> > the alternatives, what if tomorrow we want to extend the functionality
> > and drop caches only for certain file range, is this only for regular
> > files or also for directories, why posix_fadvice(DONTNEED) is not
> > sufficient.
>
> I suggested the idea originally. Let me address each of your questions in turn:

Thanks for the answer, although you forgot to comment on the question
about possibly extending the new interface to work with file ranges in
the future. For example, I have a 2 TiB file, and I am only interested
in dropping caches for the first couple of gigabytes. Would I extend
your interface, or would I come up with another one?

> Why a selective drop? To have a middle ground between echo 2 >
> drop_caches and echo 3 > drop_caches. When is this interesting? My
> particular use case was benchmarking. I wanted to repeatedly measure
> the timing when things were read from disk. Dropping everything from
> the cache, also drops useful things, not just the few files your
> benchmark intends to measure.

Sounds like a reasonable motivation for me.

> Why /proc? Because this is where the current drop_caches mechanism is
> located. If it should go somewhere else, please do suggest so.

I do not have particular suggestions, just pulling the information about
how much efforts were put into choosing the interface.

> Why not use posix_fadvice()? Because it is exactly this, an advice.
> The kernel is free to do whatever, i.e., also ignore the request. We
> want a mechanism that reliably drops select content from the cache.

OK, thanks.

--
Best Regards,
Artem Bityutskiy

2014-06-25 10:03:28

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, 2014-06-25 at 10:25 +0200, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
> > Plus some explanations WRT why proc-based interface and what would be
> > the alternatives, what if tomorrow we want to extend the functionality
> > and drop caches only for certain file range, is this only for regular
> > files or also for directories, why posix_fadvice(DONTNEED) is not
> > sufficient.
>
> I suggested the idea originally. Let me address each of your questions in turn:

I'd also be interested to see some analysis about path-based interface
vs. file descriptor-base interface. What are cons and pros. E.g. if my
path is a symlink, with path-based interface it is not obvious whether I
drop caches for the symlink itself or caches of the target.

Note, if there are no answers, fine with me, I am asking just out of
curiosity.

--
Best Regards,
Artem Bityutskiy

2014-06-25 11:21:22

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, Jun 25, 2014 at 2:20 PM, Alexey Dobriyan <[email protected]> wrote:
>> +static void clean_all_dentries_locked(struct dentry *dentry)
>> +{
>> + struct dentry *child;
>> +
>> + list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
>> + clean_all_dentries_locked(child);
>> + }
>> +
>> + clean_mapping(dentry);
>> +}
>
> unbounded recursion = kernel stack overflow

2014-06-25 13:19:27

by Thomas Knauth

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, Jun 25, 2014 at 12:03 PM, Artem Bityutskiy <[email protected]> wrote:
> On Wed, 2014-06-25 at 10:25 +0200, Thomas Knauth wrote:
>> On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
>> > Plus some explanations WRT why proc-based interface and what would be
>> > the alternatives, what if tomorrow we want to extend the functionality
>> > and drop caches only for certain file range, is this only for regular
>> > files or also for directories, why posix_fadvice(DONTNEED) is not
>> > sufficient.
>>
>> I suggested the idea originally. Let me address each of your questions in turn:
>
> I'd also be interested to see some analysis about path-based interface
> vs. file descriptor-base interface. What are cons and pros. E.g. if my
> path is a symlink, with path-based interface it is not obvious whether I
> drop caches for the symlink itself or caches of the target.

Haven't considered this case. It feels like the sensible thing to do
here is dereference the link and drop whatever it is pointing to.

2014-06-25 13:23:47

by Thomas Knauth

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, Jun 25, 2014 at 11:56 AM, Artem Bityutskiy <[email protected]> wrote:
> Thanks for the answer, although you forgot to comment on the question
> about possibly extending the new interface to work with file ranges in
> the future. For example, I have a 2 TiB file, and I am only interested
> in dropping caches for the first couple of gigabytes. Would I extend
> your interface, or would I come up with another one?

Ah, didn't quite understand what was meant with file ranges. Again, we
had not considered this so far. I guess you could make a distinction
between directories and files here. If the path points to a file, you
can have an optional argument indicating the range of bytes you would
like to drop. Something like

echo "my-file 0-1000,8000-1000" > /proc/sys/vm/sdrop_cache

If this is desirable, we can add it to the patch.

2014-06-25 13:30:28

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, 2014-06-25 at 15:23 +0200, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 11:56 AM, Artem Bityutskiy <[email protected]> wrote:
> > Thanks for the answer, although you forgot to comment on the question
> > about possibly extending the new interface to work with file ranges in
> > the future. For example, I have a 2 TiB file, and I am only interested
> > in dropping caches for the first couple of gigabytes. Would I extend
> > your interface, or would I come up with another one?
>
> Ah, didn't quite understand what was meant with file ranges. Again, we
> had not considered this so far. I guess you could make a distinction
> between directories and files here. If the path points to a file, you
> can have an optional argument indicating the range of bytes you would
> like to drop. Something like
>
> echo "my-file 0-1000,8000-1000" > /proc/sys/vm/sdrop_cache
>
> If this is desirable, we can add it to the patch.

No, I do not ask to implement this, just trying to understand how the
interface could possibly be extended.

--
Best Regards,
Artem Bityutskiy

2014-06-25 13:42:26

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, 2014-06-25 at 15:23 +0200, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 11:56 AM, Artem Bityutskiy <[email protected]> wrote:
> > Thanks for the answer, although you forgot to comment on the question
> > about possibly extending the new interface to work with file ranges in
> > the future. For example, I have a 2 TiB file, and I am only interested
> > in dropping caches for the first couple of gigabytes. Would I extend
> > your interface, or would I come up with another one?
>
> Ah, didn't quite understand what was meant with file ranges. Again, we
> had not considered this so far. I guess you could make a distinction
> between directories and files here. If the path points to a file, you
> can have an optional argument indicating the range of bytes you would
> like to drop. Something like
>
> echo "my-file 0-1000,8000-1000" > /proc/sys/vm/sdrop_cache
>
> If this is desirable, we can add it to the patch.

With a binary interface like an ioctl I can see how you could have extra
unused fields which you can ignore now and let people start adding extra
options like the range in the future.

With this kind of interface I am not sure how to do this.

Other questions I'd ask would be - how about the access control model?
Will only root be able to drop caches? Why can't I drop caches for my
own file?

I did not put much thinking into this, but it looks like ioctl could be
a better interface for the task you are trying to solve...

Sorry if I am a bit vague, I am mostly trying to make you guys give this
more thoughts, and come up with a deeper analysis. Interfaces are very
important to get right, or as right as possible...

--
Best Regards,
Artem Bityutskiy

2014-06-25 22:16:18

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed 2014-06-25 10:25:05, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
> > Plus some explanations WRT why proc-based interface and what would be
> > the alternatives, what if tomorrow we want to extend the functionality
> > and drop caches only for certain file range, is this only for regular
> > files or also for directories, why posix_fadvice(DONTNEED) is not
> > sufficient.
>
> I suggested the idea originally. Let me address each of your questions in turn:
>
> Why a selective drop? To have a middle ground between echo 2 >
> drop_caches and echo 3 > drop_caches. When is this interesting? My
> particular use case was benchmarking. I wanted to repeatedly measure
> the timing when things were read from disk. Dropping everything from
> the cache, also drops useful things, not just the few files your
> benchmark intends to measure.
>
> Why /proc? Because this is where the current drop_caches mechanism is
> located. If it should go somewhere else, please do suggest so.

It sounds like this should be a new syscall.

echoing filenames in files is strange/ugly.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-06-26 01:06:24

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Wed, Jun 25, 2014 at 10:25:05AM +0200, Thomas Knauth wrote:
> On Wed, Jun 25, 2014 at 8:25 AM, Artem Bityutskiy <[email protected]> wrote:
> > Plus some explanations WRT why proc-based interface and what would be
> > the alternatives, what if tomorrow we want to extend the functionality
> > and drop caches only for certain file range, is this only for regular
> > files or also for directories, why posix_fadvice(DONTNEED) is not
> > sufficient.
>
> I suggested the idea originally. Let me address each of your questions in turn:
>
> Why a selective drop? To have a middle ground between echo 2 >
> drop_caches and echo 3 > drop_caches. When is this interesting? My
> particular use case was benchmarking. I wanted to repeatedly measure
> the timing when things were read from disk. Dropping everything from
> the cache, also drops useful things, not just the few files your
> benchmark intends to measure.

We're not likely to ever extend the drop_caches functionality. This
is brought up semi-regularly by people that have some slightly
narrower use-case for dropping caches.

Your particular use case can be handled by directing your benchmark
at a filesystem mount point and unmounting the filesystem in between
benchmark runs. There is no ned to adding kernel functionality for
somethign that can be so easily acheived by other means, especially
in benchmark environments where *everything* is tightly controlled.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-26 06:13:52

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
> Your particular use case can be handled by directing your benchmark
> at a filesystem mount point and unmounting the filesystem in between
> benchmark runs. There is no ned to adding kernel functionality for
> somethign that can be so easily acheived by other means, especially
> in benchmark environments where *everything* is tightly controlled.

If I was a benchmark writer, I would not be willing running it as root
to be able to mount/unmount, I would not be willing to require the
customer creating special dedicated partitions for the benchmark,
because this is too user-unfriendly. Or do I make incorrect assumptions?

Not that I need this syscall and trying to sell the idea to anyone, just
trying to understand the alternative you suggested.

--
Best Regards,
Artem Bityutskiy

2014-06-26 09:30:53

by Maksym Planeta

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

> With a binary interface like an ioctl I can see how you could have extra
> unused fields which you can ignore now and let people start adding extra
> options like the range in the future.

Yes, ioctl is another possibility. But I would argue that sysctl is
more convenient interface, because idea of sdrop_caches is similar to
drop_caches's one and it is convenient to have these interfaces in the
same place. But if sdrop_caches uses procfs it seems that there is no
easy way to pass parameters of different types in one write operation.

> Other questions I'd ask would be - how about the access control model?
> Will only root be able to drop caches? Why can't I drop caches for my
> own file?

Access control model is the same as for drop_caches. This means that
only root can write to this file. But it is easy to add a feature that
allows any user to clean page cache of inodes that this user owns.

2014-06-25 15:42 GMT+02:00 Artem Bityutskiy <[email protected]>:
> On Wed, 2014-06-25 at 15:23 +0200, Thomas Knauth wrote:
>> On Wed, Jun 25, 2014 at 11:56 AM, Artem Bityutskiy <[email protected]> wrote:
>> > Thanks for the answer, although you forgot to comment on the question
>> > about possibly extending the new interface to work with file ranges in
>> > the future. For example, I have a 2 TiB file, and I am only interested
>> > in dropping caches for the first couple of gigabytes. Would I extend
>> > your interface, or would I come up with another one?
>>
>> Ah, didn't quite understand what was meant with file ranges. Again, we
>> had not considered this so far. I guess you could make a distinction
>> between directories and files here. If the path points to a file, you
>> can have an optional argument indicating the range of bytes you would
>> like to drop. Something like
>>
>> echo "my-file 0-1000,8000-1000" > /proc/sys/vm/sdrop_cache
>>
>> If this is desirable, we can add it to the patch.
>
> With a binary interface like an ioctl I can see how you could have extra
> unused fields which you can ignore now and let people start adding extra
> options like the range in the future.
>
> With this kind of interface I am not sure how to do this.
>
> Other questions I'd ask would be - how about the access control model?
> Will only root be able to drop caches? Why can't I drop caches for my
> own file?
>
> I did not put much thinking into this, but it looks like ioctl could be
> a better interface for the task you are trying to solve...
>
> Sorry if I am a bit vague, I am mostly trying to make you guys give this
> more thoughts, and come up with a deeper analysis. Interfaces are very
> important to get right, or as right as possible...
>
> --
> Best Regards,
> Artem Bityutskiy
>



--
Regards,
Maksym Planeta.

2014-06-26 10:37:00

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
> On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
>> Your particular use case can be handled by directing your benchmark
>> at a filesystem mount point and unmounting the filesystem in between
>> benchmark runs. There is no ned to adding kernel functionality for
>> somethign that can be so easily acheived by other means, especially
>> in benchmark environments where *everything* is tightly controlled.
>
> If I was a benchmark writer, I would not be willing running it as root
> to be able to mount/unmount, I would not be willing to require the
> customer creating special dedicated partitions for the benchmark,
> because this is too user-unfriendly. Or do I make incorrect assumptions?

But why a sysctl then? And also don't see a point for that at all, why
can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?


Cheers,
Bernd

2014-06-26 11:31:11

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Thu, 2014-06-26 at 12:36 +0200, Bernd Schubert wrote:
> On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
> > On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
> >> Your particular use case can be handled by directing your benchmark
> >> at a filesystem mount point and unmounting the filesystem in between
> >> benchmark runs. There is no ned to adding kernel functionality for
> >> somethign that can be so easily acheived by other means, especially
> >> in benchmark environments where *everything* is tightly controlled.
> >
> > If I was a benchmark writer, I would not be willing running it as root
> > to be able to mount/unmount, I would not be willing to require the
> > customer creating special dedicated partitions for the benchmark,
> > because this is too user-unfriendly. Or do I make incorrect assumptions?
>
> But why a sysctl then? And also don't see a point for that at all, why
> can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?

The latter question was answered - people want a way to drop caches for
a file. They need a method which guarantees that the caches are dropped.
They do not need an advisory method which does not give any guarantees.

As for the first question - this was what I was also asking too, but
without suggesting alternatives. I challenged the authors with the
following:

1. Why the interface would only allow the super user dropping the
caches? How about allowing the file owner or, generally speaking, the
person who is allowed to modify the file, drop the caches?

I alluded that this may be doable with an fd-based interface.

2. What about symlinks? Can I have a choice whether I drop caches
(struct inode, I suppose) for the symlink itself or for the destination
file? Again, fd-based interface would probably naturally allow for this.

3. What about leaving some room for future extensions? E.g., someone may
want to drop only part of a file in the future, who knows. Can we invent
an interface which would allow to be extended in the future, without
breaking older software?

My intention was to encourage the submitter to take some time and come
back with deeper analysis.

And finally, and most importantly, Dave stated that any per-file cache
dropping interface is unlikely going to be accepted at all, because
there is mount/unmount.

So far this is the mane concern the submitter should address.

But I just answered that what Dave suggested is probably not the nicest
way to do this from the user-space perspective, because it requires
superuser privileges, and probably a separate "benchmark-only"
partition.

So if the authors want to sell this new interface (in whatever form) to
the kernel community, they should start with providing a solid use-case,
with some more details, explore alternatives and show how the
alternatives do not work for them.

--
Best Regards,
Artem Bityutskiy

2014-06-26 11:58:10

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Thu, 26 Jun 2014, Artem Bityutskiy wrote:

> Date: Thu, 26 Jun 2014 14:31:03 +0300
> From: Artem Bityutskiy <[email protected]>
> To: Bernd Schubert <[email protected]>
> Cc: Dave Chinner <[email protected]>, Thomas Knauth <[email protected]>,
> David Rientjes <[email protected]>,
> Maksym Planeta <[email protected]>,
> Alexander Viro <[email protected]>, [email protected],
> [email protected]
> Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively
>
> On Thu, 2014-06-26 at 12:36 +0200, Bernd Schubert wrote:
> > On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
> > > On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
> > >> Your particular use case can be handled by directing your benchmark
> > >> at a filesystem mount point and unmounting the filesystem in between
> > >> benchmark runs. There is no ned to adding kernel functionality for
> > >> somethign that can be so easily acheived by other means, especially
> > >> in benchmark environments where *everything* is tightly controlled.
> > >
> > > If I was a benchmark writer, I would not be willing running it as root
> > > to be able to mount/unmount, I would not be willing to require the
> > > customer creating special dedicated partitions for the benchmark,
> > > because this is too user-unfriendly. Or do I make incorrect assumptions?
> >
> > But why a sysctl then? And also don't see a point for that at all, why
> > can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?
>
> The latter question was answered - people want a way to drop caches for
> a file. They need a method which guarantees that the caches are dropped.
> They do not need an advisory method which does not give any guarantees.
>
> As for the first question - this was what I was also asking too, but
> without suggesting alternatives. I challenged the authors with the
> following:
>
> 1. Why the interface would only allow the super user dropping the
> caches? How about allowing the file owner or, generally speaking, the
> person who is allowed to modify the file, drop the caches?
>
> I alluded that this may be doable with an fd-based interface.
>
> 2. What about symlinks? Can I have a choice whether I drop caches
> (struct inode, I suppose) for the symlink itself or for the destination
> file? Again, fd-based interface would probably naturally allow for this.
>
> 3. What about leaving some room for future extensions? E.g., someone may
> want to drop only part of a file in the future, who knows. Can we invent
> an interface which would allow to be extended in the future, without
> breaking older software?
>
> My intention was to encourage the submitter to take some time and come
> back with deeper analysis.
>
> And finally, and most importantly, Dave stated that any per-file cache
> dropping interface is unlikely going to be accepted at all, because
> there is mount/unmount.
>
> So far this is the mane concern the submitter should address.
>
> But I just answered that what Dave suggested is probably not the nicest
> way to do this from the user-space perspective, because it requires
> superuser privileges, and probably a separate "benchmark-only"
> partition.

I think that Dave is right in that if it's just for a "benchmarking"
purposes, then there is no need for a new special interface for
dropping caches. There is mount/umount and drop_caches which should
be more than enough for any benchmark. And while it's true that
you'd likely need superuser privileges for mount/umount, the same is
true about drop_caches, isn't it ?

>
> So if the authors want to sell this new interface (in whatever form) to
> the kernel community, they should start with providing a solid use-case,
> with some more details, explore alternatives and show how the
> alternatives do not work for them.

Yes please, let's see some solid use-case for this.

-Lukas

2014-06-26 12:10:37

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On 06/26/2014 01:57 PM, Lukáš Czerner wrote:
> On Thu, 26 Jun 2014, Artem Bityutskiy wrote:
>> On Thu, 2014-06-26 at 12:36 +0200, Bernd Schubert wrote:
>>> On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
>>>> On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
>>>>> Your particular use case can be handled by directing your benchmark
>>>>> at a filesystem mount point and unmounting the filesystem in between
>>>>> benchmark runs. There is no ned to adding kernel functionality for
>>>>> somethign that can be so easily acheived by other means, especially
>>>>> in benchmark environments where *everything* is tightly controlled.
>>>>
>>>> If I was a benchmark writer, I would not be willing running it as root
>>>> to be able to mount/unmount, I would not be willing to require the
>>>> customer creating special dedicated partitions for the benchmark,
>>>> because this is too user-unfriendly. Or do I make incorrect assumptions?
>>>
>>> But why a sysctl then? And also don't see a point for that at all, why
>>> can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?
>>
>> The latter question was answered - people want a way to drop caches for
>> a file. They need a method which guarantees that the caches are dropped.
>> They do not need an advisory method which does not give any guarantees.

I'm not sure if a benchmark really needs that so much that FADV_DONTNEED
isn't sufficient.
Personally I would also like to know if FADV_DONTNEED succeeded. I.e.
'ql-fstest' is to check if the written pattern went to the block device
and currently it does not know if data really had been dropped from the
page cache. As it reads files several times this is not critical, but
only would be a nice to have - nothing worth to add a new syscall.


Cheers,
Bernd

2014-06-27 02:48:39

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Thu, Jun 26, 2014 at 09:13:19AM +0300, Artem Bityutskiy wrote:
> On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
> > Your particular use case can be handled by directing your benchmark
> > at a filesystem mount point and unmounting the filesystem in between
> > benchmark runs. There is no ned to adding kernel functionality for
> > somethign that can be so easily acheived by other means, especially
> > in benchmark environments where *everything* is tightly controlled.
>
> If I was a benchmark writer, I would not be willing running it as root
> to be able to mount/unmount, I would not be willing to require the
> customer creating special dedicated partitions for the benchmark,
> because this is too user-unfriendly. Or do I make incorrect assumptions?

Just add the dev/mntpt to /etc/fstab and add "user" to the
configuration and the need for root goes away.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-27 02:55:08

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Thu, Jun 26, 2014 at 02:10:28PM +0200, Bernd Schubert wrote:
> On 06/26/2014 01:57 PM, Lukáš Czerner wrote:
> >On Thu, 26 Jun 2014, Artem Bityutskiy wrote:
> >>On Thu, 2014-06-26 at 12:36 +0200, Bernd Schubert wrote:
> >>>On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
> >>>>On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
> >>>>>Your particular use case can be handled by directing your benchmark
> >>>>>at a filesystem mount point and unmounting the filesystem in between
> >>>>>benchmark runs. There is no ned to adding kernel functionality for
> >>>>>somethign that can be so easily acheived by other means, especially
> >>>>>in benchmark environments where *everything* is tightly controlled.
> >>>>
> >>>>If I was a benchmark writer, I would not be willing running it as root
> >>>>to be able to mount/unmount, I would not be willing to require the
> >>>>customer creating special dedicated partitions for the benchmark,
> >>>>because this is too user-unfriendly. Or do I make incorrect assumptions?
> >>>
> >>>But why a sysctl then? And also don't see a point for that at all, why
> >>>can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?
> >>
> >>The latter question was answered - people want a way to drop caches for
> >>a file. They need a method which guarantees that the caches are dropped.
> >>They do not need an advisory method which does not give any guarantees.
>
> I'm not sure if a benchmark really needs that so much that
> FADV_DONTNEED isn't sufficient.
> Personally I would also like to know if FADV_DONTNEED succeeded.
> I.e. 'ql-fstest' is to check if the written pattern went to the
> block device and currently it does not know if data really had been
> dropped from the page cache. As it reads files several times this is
> not critical, but only would be a nice to have - nothing worth to
> add a new syscall.

ql-test is not a benchmark, it's a data integrity test. The re-read
verification problem is easily solved by using direct IO to read the
files directly without going through the page cache. Indeed, direct
IO will invalidate cached pages over the range it reads before it
does the read, so the guarantee that you are after - no cached pages
when the read is done - is also fulfilled by the direct IO read...

I really don't understand why people keep trying to make cached IO
behave like uncached IO when we already have uncached IO
interfaces....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-27 08:49:15

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On 26.06.2014 13:57, Luk?? Czerner wrote:

> > So if the authors want to sell this new interface (in whatever form) to
> > the kernel community, they should start with providing a solid use-case,
> > with some more details, explore alternatives and show how the
> > alternatives do not work for them.
>
> Yes please, let's see some solid use-case for this.

Personally i would want it to verify files after copying them:
Especially while moving files:
- Copy a file
- <drop cache>
- Verify that it really is correct on stable storage
- Remove original file

Currently i choose either of the 3 ways:
- drop_caches
- umount/mount
- Write more data than memory in machine (Which is only an
approximnation and you have to verify in the same order the files were
written, so it is likely that any cache was thrashed in the meantime)

But having a way to selectivly "destory" the cache of a file would make
this task easier.




--

Matthias

2014-06-27 08:58:33

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On 06/27/2014 04:55 AM, Dave Chinner wrote:
> On Thu, Jun 26, 2014 at 02:10:28PM +0200, Bernd Schubert wrote:
>> On 06/26/2014 01:57 PM, Lukáš Czerner wrote:
>>> On Thu, 26 Jun 2014, Artem Bityutskiy wrote:
>>>> On Thu, 2014-06-26 at 12:36 +0200, Bernd Schubert wrote:
>>>>> On 06/26/2014 08:13 AM, Artem Bityutskiy wrote:
>>>>>> On Thu, 2014-06-26 at 11:06 +1000, Dave Chinner wrote:
>>>>>>> Your particular use case can be handled by directing your benchmark
>>>>>>> at a filesystem mount point and unmounting the filesystem in between
>>>>>>> benchmark runs. There is no ned to adding kernel functionality for
>>>>>>> somethign that can be so easily acheived by other means, especially
>>>>>>> in benchmark environments where *everything* is tightly controlled.
>>>>>>
>>>>>> If I was a benchmark writer, I would not be willing running it as root
>>>>>> to be able to mount/unmount, I would not be willing to require the
>>>>>> customer creating special dedicated partitions for the benchmark,
>>>>>> because this is too user-unfriendly. Or do I make incorrect assumptions?
>>>>>
>>>>> But why a sysctl then? And also don't see a point for that at all, why
>>>>> can't the benchmark use posix_fadvise(POSIX_FADV_DONTNEED)?
>>>>
>>>> The latter question was answered - people want a way to drop caches for
>>>> a file. They need a method which guarantees that the caches are dropped.
>>>> They do not need an advisory method which does not give any guarantees.
>>
>> I'm not sure if a benchmark really needs that so much that
>> FADV_DONTNEED isn't sufficient.
>> Personally I would also like to know if FADV_DONTNEED succeeded.
>> I.e. 'ql-fstest' is to check if the written pattern went to the
>> block device and currently it does not know if data really had been
>> dropped from the page cache. As it reads files several times this is
>> not critical, but only would be a nice to have - nothing worth to
>> add a new syscall.
>
> ql-test is not a benchmark, it's a data integrity test. The re-read
> verification problem is easily solved by using direct IO to read the
> files directly without going through the page cache. Indeed, direct
> IO will invalidate cached pages over the range it reads before it
> does the read, so the guarantee that you are after - no cached pages
> when the read is done - is also fulfilled by the direct IO read...
>
> I really don't understand why people keep trying to make cached IO
> behave like uncached IO when we already have uncached IO
> interfaces....


Firstly, direct IO has an entirely different IO pattern, usually much
simpler than buffered through the page cache. Secondly, going through
the page cache ensures that page cache buffering is also tested.
I'm not at all opposed to open files randomly with direct IO to also
test that path and I'm going to add that soon, but only using direct IO
would limit the use case of ql-fstest.


Bernd

2014-06-27 09:05:07

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Fri, 27 Jun 2014, Matthias Schniedermeyer wrote:

> Date: Fri, 27 Jun 2014 10:41:39 +0200
> From: Matthias Schniedermeyer <[email protected]>
> To: Luk?? Czerner <[email protected]>
> Cc: Artem Bityutskiy <[email protected]>,
> Bernd Schubert <[email protected]>,
> Dave Chinner <[email protected]>, Thomas Knauth <[email protected]>,
> David Rientjes <[email protected]>,
> Maksym Planeta <[email protected]>,
> Alexander Viro <[email protected]>, [email protected],
> [email protected]
> Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively
>
> On 26.06.2014 13:57, Luk?? Czerner wrote:
>
> > > So if the authors want to sell this new interface (in whatever form) to
> > > the kernel community, they should start with providing a solid use-case,
> > > with some more details, explore alternatives and show how the
> > > alternatives do not work for them.
> >
> > Yes please, let's see some solid use-case for this.
>
> Personally i would want it to verify files after copying them:
> Especially while moving files:
> - Copy a file
> - <drop cache>
> - Verify that it really is correct on stable storage
> - Remove original file

I assume you're using cp to copy a file, not your own program. In
that case can we make cp optionally use direct io ? It seems that it
would solve your problem in very elegant way.

-Lukas

>
> Currently i choose either of the 3 ways:
> - drop_caches
> - umount/mount
> - Write more data than memory in machine (Which is only an
> approximnation and you have to verify in the same order the files were
> written, so it is likely that any cache was thrashed in the meantime)
>
> But having a way to selectivly "destory" the cache of a file would make
> this task easier.
>
>
>
>
>

2014-06-27 09:08:31

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Fri, 2014-06-27 at 10:41 +0200, Matthias Schniedermeyer wrote:
> On 26.06.2014 13:57, Luká? Czerner wrote:
>
> > > So if the authors want to sell this new interface (in whatever form) to
> > > the kernel community, they should start with providing a solid use-case,
> > > with some more details, explore alternatives and show how the
> > > alternatives do not work for them.
> >
> > Yes please, let's see some solid use-case for this.
>
> Personally i would want it to verify files after copying them:
> Especially while moving files:
> - Copy a file
> - <drop cache>
> - Verify that it really is correct on stable storage
> - Remove original file

To make 100% sure you'd not only need to drop VFS-level caches but also
file-system-level caches. Indeed, file-systems have their own rather
buffers for different indexing data-structures, etc. The unmount/mount
sequence takes care of that.

--
Best Regards,
Artem Bityutskiy

2014-06-27 09:09:59

by Bityutskiy, Artem

[permalink] [raw]
Subject: Re: [PATCH] sysctl: Add a feature to drop caches selectively

On Fri, 2014-06-27 at 12:08 +0300, Artem Bityutskiy wrote:
> To make 100% sure you'd not only need to drop VFS-level caches but also
> file-system-level caches. Indeed, file-systems have their own rather
Sorry, I wanted to say "rather complex" here
> buffers for different indexing data-structures, etc. The unmount/mount
> sequence takes care of that.
>

--
Best Regards,
Artem Bityutskiy
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki
Business Identity Code: 0357606 - 4
Domiciled in Helsinki

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?