Hi,
I would like to resurrect the following Dave's patch. The last time it
has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
didn't seem to be any strong opposition.
Kosaki was worried about possible excessive logging when somebody drops
caches too often (but then he claimed he didn't have a strong opinion
on that) but I would say opposite. If somebody does that then I would
really like to know that from the log when supporting a system because
it almost for sure means that there is something fishy going on. It is
also worth mentioning that only root can write drop caches so this is
not an flooding attack vector.
I am bringing that up again because this can be really helpful when
chasing strange performance issues which (surprise surprise) turn out to
be related to artificially dropped caches done because the admin thinks
this would help...
I have just refreshed the original patch on top of the current mm tree
but I could live with KERN_INFO as well if people think that KERN_NOTICE
is too hysterical.
---
>From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
From: Dave Hansen <[email protected]>
Date: Fri, 12 Oct 2012 14:30:54 +0200
Subject: [PATCH] add some drop_caches documentation and info messsge
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape". Perhaps adding some kernel
documentation will increase the amount of accurate data on its use.
If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder
to find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.
It's a great debugging tool, and is really handy for doing things
like repeatable benchmark runs. So, add a bit more documentation
about it, and add a little KERN_NOTICE. It should help developers
who are chasing down reclaim-related bugs.
[[email protected]: refreshed to current -mm tree]
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
Documentation/sysctl/vm.txt | 33 +++++++++++++++++++++++++++------
fs/drop_caches.c | 2 ++
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 078701f..21ad181 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -138,18 +138,39 @@ Setting this to zero disables periodic writeback altogether.
drop_caches
-Writing to this will cause the kernel to drop clean caches, dentries and
-inodes from memory, causing that memory to become free.
+Writing to this will cause the kernel to drop clean caches, as well as
+reclaimable slab objects like dentries and inodes. Once dropped, their
+memory becomes free.
To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
-To free dentries and inodes:
+To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
-To free pagecache, dentries and inodes:
+To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches
-As this is a non-destructive operation and dirty objects are not freeable, the
-user should run `sync' first.
+This is a non-destructive operation and will not free any dirty objects.
+To increase the number of objects freed by this operation, the user may run
+`sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the
+number of dirty objects on the system and create more candidates to be
+dropped.
+
+This file is not a means to control the growth of the various kernel caches
+(inodes, dentries, pagecache, etc...) These objects are automatically
+reclaimed by the kernel when memory is needed elsewhere on the system.
+
+Use of this file can cause performance problems. Since it discards cached
+objects, it may cost a significant amount of I/O and CPU to recreate the
+dropped objects, especially if they were under heavy use. Because of this,
+use outside of a testing or debugging environment is not recommended.
+
+You may see informational messages in your kernel log when this file is
+used:
+
+ cat (1234): dropped kernel caches: 3
+
+These are informational only. They do not mean that anything is wrong
+with your system.
==============================================================
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index c00e055..f72395e 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -58,6 +58,8 @@ int drop_caches_sysctl_handler(ctl_table *table, int write,
if (ret)
return ret;
if (write) {
+ printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
+ current->comm, task_pid_nr(current), sysctl_drop_caches);
if (sysctl_drop_caches & 1)
iterate_supers(drop_pagecache_sb, NULL);
if (sysctl_drop_caches & 2)
--
1.7.10.4
--
Michal Hocko
SUSE Labs
On Fri, Oct 12, 2012 at 8:57 AM, Michal Hocko <[email protected]> wrote:
> Hi,
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition.
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
> I am bringing that up again because this can be really helpful when
> chasing strange performance issues which (surprise surprise) turn out to
> be related to artificially dropped caches done because the admin thinks
> this would help...
>
> I have just refreshed the original patch on top of the current mm tree
> but I could live with KERN_INFO as well if people think that KERN_NOTICE
> is too hysterical.
> ---
> From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> From: Dave Hansen <[email protected]>
> Date: Fri, 12 Oct 2012 14:30:54 +0200
> Subject: [PATCH] add some drop_caches documentation and info messsge
>
> There is plenty of anecdotal evidence and a load of blog posts
> suggesting that using "drop_caches" periodically keeps your system
> running in "tip top shape". Perhaps adding some kernel
> documentation will increase the amount of accurate data on its use.
>
> If we are not shrinking caches effectively, then we have real bugs.
> Using drop_caches will simply mask the bugs and make them harder
> to find, but certainly does not fix them, nor is it an appropriate
> "workaround" to limit the size of the caches.
>
> It's a great debugging tool, and is really handy for doing things
> like repeatable benchmark runs. So, add a bit more documentation
> about it, and add a little KERN_NOTICE. It should help developers
> who are chasing down reclaim-related bugs.
>
> [[email protected]: refreshed to current -mm tree]
> Signed-off-by: Dave Hansen <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
Looks fine.
Acked-by: KOSAKI Motohiro <[email protected]>
(2012/10/12 21:57), Michal Hocko wrote:
> Hi,
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition.
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
> I am bringing that up again because this can be really helpful when
> chasing strange performance issues which (surprise surprise) turn out to
> be related to artificially dropped caches done because the admin thinks
> this would help...
>
> I have just refreshed the original patch on top of the current mm tree
> but I could live with KERN_INFO as well if people think that KERN_NOTICE
> is too hysterical.
> ---
> From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> From: Dave Hansen <[email protected]>
> Date: Fri, 12 Oct 2012 14:30:54 +0200
> Subject: [PATCH] add some drop_caches documentation and info messsge
>
> There is plenty of anecdotal evidence and a load of blog posts
> suggesting that using "drop_caches" periodically keeps your system
> running in "tip top shape". Perhaps adding some kernel
> documentation will increase the amount of accurate data on its use.
>
> If we are not shrinking caches effectively, then we have real bugs.
> Using drop_caches will simply mask the bugs and make them harder
> to find, but certainly does not fix them, nor is it an appropriate
> "workaround" to limit the size of the caches.
>
> It's a great debugging tool, and is really handy for doing things
> like repeatable benchmark runs. So, add a bit more documentation
> about it, and add a little KERN_NOTICE. It should help developers
> who are chasing down reclaim-related bugs.
>
> [[email protected]: refreshed to current -mm tree]
> Signed-off-by: Dave Hansen <[email protected]>
> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
On 10/12/2012 05:57 AM, Michal Hocko wrote:
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition.
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
Just read through the patch again. Still looks great to me.
Thanks for bringing it up again, Michal!
On Fri, 12 Oct 2012 14:57:08 +0200
Michal Hocko <[email protected]> wrote:
> Hi,
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition.
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
> I am bringing that up again because this can be really helpful when
> chasing strange performance issues which (surprise surprise) turn out to
> be related to artificially dropped caches done because the admin thinks
> this would help...
>
> I have just refreshed the original patch on top of the current mm tree
> but I could live with KERN_INFO as well if people think that KERN_NOTICE
> is too hysterical.
> ---
> >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> From: Dave Hansen <[email protected]>
> Date: Fri, 12 Oct 2012 14:30:54 +0200
> Subject: [PATCH] add some drop_caches documentation and info messsge
>
> There is plenty of anecdotal evidence and a load of blog posts
> suggesting that using "drop_caches" periodically keeps your system
> running in "tip top shape". Perhaps adding some kernel
> documentation will increase the amount of accurate data on its use.
>
> If we are not shrinking caches effectively, then we have real bugs.
> Using drop_caches will simply mask the bugs and make them harder
> to find, but certainly does not fix them, nor is it an appropriate
> "workaround" to limit the size of the caches.
>
> It's a great debugging tool, and is really handy for doing things
> like repeatable benchmark runs. So, add a bit more documentation
> about it, and add a little KERN_NOTICE. It should help developers
> who are chasing down reclaim-related bugs.
>
> ...
>
> + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
> + current->comm, task_pid_nr(current), sysctl_drop_caches);
urgh. Are we really sure we want to do this? The system operators who
are actually using this thing will hate us :(
More friendly alternatives might be:
- Taint the kernel. But that will only become apparent with an oops
trace or similar.
- Add a drop_caches counter and make that available in /proc/vmstat,
show_mem() output and perhaps other places.
I suspect the /proc/vmstat counter will suffice - if someone is having
vm issues, we'll be seeing their /proc/vmstat at some stage and if the
drop_caches counter is high, that's enough to get suspicious?
On Tue 23-10-12 16:45:46, Andrew Morton wrote:
> On Fri, 12 Oct 2012 14:57:08 +0200
> Michal Hocko <[email protected]> wrote:
>
> > Hi,
> > I would like to resurrect the following Dave's patch. The last time it
> > has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> > didn't seem to be any strong opposition.
> > Kosaki was worried about possible excessive logging when somebody drops
> > caches too often (but then he claimed he didn't have a strong opinion
> > on that) but I would say opposite. If somebody does that then I would
> > really like to know that from the log when supporting a system because
> > it almost for sure means that there is something fishy going on. It is
> > also worth mentioning that only root can write drop caches so this is
> > not an flooding attack vector.
> > I am bringing that up again because this can be really helpful when
> > chasing strange performance issues which (surprise surprise) turn out to
> > be related to artificially dropped caches done because the admin thinks
> > this would help...
> >
> > I have just refreshed the original patch on top of the current mm tree
> > but I could live with KERN_INFO as well if people think that KERN_NOTICE
> > is too hysterical.
> > ---
> > >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> > From: Dave Hansen <[email protected]>
> > Date: Fri, 12 Oct 2012 14:30:54 +0200
> > Subject: [PATCH] add some drop_caches documentation and info messsge
> >
> > There is plenty of anecdotal evidence and a load of blog posts
> > suggesting that using "drop_caches" periodically keeps your system
> > running in "tip top shape". Perhaps adding some kernel
> > documentation will increase the amount of accurate data on its use.
> >
> > If we are not shrinking caches effectively, then we have real bugs.
> > Using drop_caches will simply mask the bugs and make them harder
> > to find, but certainly does not fix them, nor is it an appropriate
> > "workaround" to limit the size of the caches.
> >
> > It's a great debugging tool, and is really handy for doing things
> > like repeatable benchmark runs. So, add a bit more documentation
> > about it, and add a little KERN_NOTICE. It should help developers
> > who are chasing down reclaim-related bugs.
> >
> > ...
> >
> > + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
> > + current->comm, task_pid_nr(current), sysctl_drop_caches);
>
> urgh. Are we really sure we want to do this? The system operators who
> are actually using this thing will hate us :(
I have no problems with lowering the priority (how do you see
KERN_INFO?) but shouldn't this message kick them that they are doing
something wrong? Or if somebody uses that for "benchmarking" to have a
clean table before start is this really that invasive?
> More friendly alternatives might be:
>
> - Taint the kernel. But that will only become apparent with an oops
> trace or similar.
>
> - Add a drop_caches counter and make that available in /proc/vmstat,
> show_mem() output and perhaps other places.
We would loose timing and originating process name in both cases which
can be really helpful while debugging. It is fair to say that we could
deduce the timing if we are collecting /proc/meminfo or /proc/vmstat
already and we do collect them often but this is not the case all of the
time and sometimes it is important to know _who_ is doing all this.
--
Michal Hocko
SUSE Labs
On Wed, 24 Oct 2012 08:29:45 +0200
Michal Hocko <[email protected]> wrote:
> > >
> > > + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
> > > + current->comm, task_pid_nr(current), sysctl_drop_caches);
> >
> > urgh. Are we really sure we want to do this? The system operators who
> > are actually using this thing will hate us :(
>
> I have no problems with lowering the priority (how do you see
> KERN_INFO?) but shouldn't this message kick them that they are doing
> something wrong? Or if somebody uses that for "benchmarking" to have a
> clean table before start is this really that invasive?
hmpf. This patch worries me. If there are people out there who are
regularly using drop_caches because the VM sucks, it seems pretty
obnoxious of us to go dumping stuff into their syslog. What are they
supposed to do? Stop using drop_caches? But that would unfix the
problem which they fixed with drop_caches in the first case.
And they might not even have control over the code - they need to go
back to their supplier and say "please send me a new version", along
with all the additional costs and risks involed in an update.
> > More friendly alternatives might be:
> >
> > - Taint the kernel. But that will only become apparent with an oops
> > trace or similar.
> >
> > - Add a drop_caches counter and make that available in /proc/vmstat,
> > show_mem() output and perhaps other places.
>
> We would loose timing and originating process name in both cases which
> can be really helpful while debugging. It is fair to say that we could
> deduce the timing if we are collecting /proc/meminfo or /proc/vmstat
> already and we do collect them often but this is not the case all of the
> time and sometimes it is important to know _who_ is doing all this.
But how important is all that? The main piece of information the
kernel developer wants is "this guy is using drop_caches a lot". All
the other info is peripheral and can be gathered by other means if so
desired.
On 10/24/2012 12:54 PM, Andrew Morton wrote:
> hmpf. This patch worries me. If there are people out there who are
> regularly using drop_caches because the VM sucks, it seems pretty
> obnoxious of us to go dumping stuff into their syslog. What are they
> supposed to do? Stop using drop_caches?
People use drop_caches because they _think_ the VM sucks, or they
_think_ they're "tuning" their system. _They_ are supposed to stop
using drop_caches. :)
What kind of interface _is_ it in the first place? Is it really a
production-level thing that we expect users to be poking at? Or, is it
a rarely-used debugging and benchmarking knob which is fair game for us
to tweak like this?
Do we have any valid uses of drop_caches where the printk() would truly
_be_ disruptive? Are those cases where we _also_ have real kernel bugs
or issues that we should be working? If it disrupts them and they go to
their vendor or the community directly, it gives us at least a shot at
fixing the real problems (or fixing the "invalid" use).
Adding taint, making this a single-shot printk, or adding vmstat
counters are all good ideas. I guess I think the disruption is a
feature because I hope it will draw some folks out of the woodwork.
On Wed, 24 Oct 2012 13:28:19 -0700
Dave Hansen <[email protected]> wrote:
> On 10/24/2012 12:54 PM, Andrew Morton wrote:
> > hmpf. This patch worries me. If there are people out there who are
> > regularly using drop_caches because the VM sucks, it seems pretty
> > obnoxious of us to go dumping stuff into their syslog. What are they
> > supposed to do? Stop using drop_caches?
>
> People use drop_caches because they _think_ the VM sucks, or they
> _think_ they're "tuning" their system. _They_ are supposed to stop
> using drop_caches. :)
Well who knows. Could be that people's vm *does* suck. Or they have
some particularly peculiar worklosd or requirement[*]. Or their VM
*used* to suck, and the drop_caches is not really needed any more but
it's there in vendor-provided code and they can't practically prevent
it.
[*] If your workload consists of having to handle large bursts of data
with minimum latency and then waiting around for another burst, it
makes sense to drop all your cached data between bursts.
> What kind of interface _is_ it in the first place? Is it really a
> production-level thing that we expect users to be poking at? Or, is it
> a rarely-used debugging and benchmarking knob which is fair game for us
> to tweak like this?
It was a rarely-used mainly-developer-only thing which, apparently, real
people found useful at some point in the past. Perhaps we should never
have offered it.
> Do we have any valid uses of drop_caches where the printk() would truly
> _be_ disruptive? Are those cases where we _also_ have real kernel bugs
> or issues that we should be working? If it disrupts them and they go to
> their vendor or the community directly, it gives us at least a shot at
> fixing the real problems (or fixing the "invalid" use).
Heaven knows - I'm just going from what Michal has told me and various
rumors which keep surfacing on the internet ;)
> Adding taint, making this a single-shot printk, or adding vmstat
> counters are all good ideas. I guess I think the disruption is a
> feature because I hope it will draw some folks out of the woodwork.
I had a "send mail to [email protected]" printk in 3c59x.c many years
ago. For about two months. It took *years* before I stopped getting
emails ;)
Gee, I dunno. I have issues with it :( We could do
printk_ratelimited(one-hour) but I suspect that would defeat Michal's
purpose.
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> Well who knows. Could be that people's vm *does* suck. Or they have
> some particularly peculiar worklosd or requirement[*]. Or their VM
> *used* to suck, and the drop_caches is not really needed any more but
> it's there in vendor-provided code and they can't practically prevent
> it.
I have drop_caches in my suspend-to-disk script so that the hibernation
image is kept at minimum and suspend times are as small as possible.
Would that be a valid use-case?
--
Regards/Gruss,
Boris.
On Wed, 24 Oct 2012 23:06:00 +0200
Borislav Petkov <[email protected]> wrote:
> On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> > Well who knows. Could be that people's vm *does* suck. Or they have
> > some particularly peculiar worklosd or requirement[*]. Or their VM
> > *used* to suck, and the drop_caches is not really needed any more but
> > it's there in vendor-provided code and they can't practically prevent
> > it.
>
> I have drop_caches in my suspend-to-disk script so that the hibernation
> image is kept at minimum and suspend times are as small as possible.
hm, that sounds smart.
> Would that be a valid use-case?
I'd say so, unless we change the kernel to do that internally. We do
have the hibernation-specific shrink_all_memory() in the vmscan code.
We didn't see fit to document _why_ that exists, but IIRC it's there to
create enough free memory for hibernation to be able to successfully
complete, but no more.
Who owns hibernaton nowadays? Rafael, I guess?
On 10/24/2012 02:06 PM, Borislav Petkov wrote:
> On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
>> Well who knows. Could be that people's vm *does* suck. Or they have
>> some particularly peculiar worklosd or requirement[*]. Or their VM
>> *used* to suck, and the drop_caches is not really needed any more but
>> it's there in vendor-provided code and they can't practically prevent
>> it.
>
> I have drop_caches in my suspend-to-disk script so that the hibernation
> image is kept at minimum and suspend times are as small as possible.
>
> Would that be a valid use-case?
Sounds fairly valid to me. But, it's also one that would not be harmed
or disrupted in any way because of a single additional printk() during
each suspend-to-disk operation.
On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote:
> On Wed, 24 Oct 2012 23:06:00 +0200
> Borislav Petkov <[email protected]> wrote:
>
> > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> > > Well who knows. Could be that people's vm *does* suck. Or they have
> > > some particularly peculiar worklosd or requirement[*]. Or their VM
> > > *used* to suck, and the drop_caches is not really needed any more but
> > > it's there in vendor-provided code and they can't practically prevent
> > > it.
> >
> > I have drop_caches in my suspend-to-disk script so that the hibernation
> > image is kept at minimum and suspend times are as small as possible.
>
> hm, that sounds smart.
>
> > Would that be a valid use-case?
>
> I'd say so, unless we change the kernel to do that internally. We do
> have the hibernation-specific shrink_all_memory() in the vmscan code.
> We didn't see fit to document _why_ that exists, but IIRC it's there to
> create enough free memory for hibernation to be able to successfully
> complete, but no more.
That's correct.
> Who owns hibernaton nowadays? Rafael, I guess?
I'm still maintaining it.
Thanks,
Rafael
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
>> I have drop_caches in my suspend-to-disk script so that the hibernation
>> image is kept at minimum and suspend times are as small as possible.
>
> hm, that sounds smart.
>
>> Would that be a valid use-case?
>
> I'd say so, unless we change the kernel to do that internally. We do
> have the hibernation-specific shrink_all_memory() in the vmscan code.
> We didn't see fit to document _why_ that exists, but IIRC it's there to
> create enough free memory for hibernation to be able to successfully
> complete, but no more.
shrink_all_memory() drop minimum memory to be needed from hibernation.
that's trade off matter.
- drop all page cache
pros.
speed up hibernation time
cons.
after go back from hibernation, system works very slow a while until
system will get enough file cache.
- drop minimum page cache
pros.
system works quickly when go back from hibernation.
cons.
relative large hibernation time
So, I'm not fun change hibernation default. hmmm... Does adding tracepint instead of printk
makes sense?
On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote:
> Sounds fairly valid to me. But, it's also one that would not be harmed
> or disrupted in any way because of a single additional printk() during
> each suspend-to-disk operation.
Btw,
back to the drop_caches patch. How about we hide the drop_caches
interface behind some mm debugging option in "Kernel Hacking"? Assuming
we don't need it otherwise on production kernels. Probably make it
depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so.
And then also add it to /proc/vmstat, in addition.
--
Regards/Gruss,
Boris.
On 10/24/2012 03:48 PM, Borislav Petkov wrote:
> On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote:
>> Sounds fairly valid to me. But, it's also one that would not be harmed
>> or disrupted in any way because of a single additional printk() during
>> each suspend-to-disk operation.
>
> back to the drop_caches patch. How about we hide the drop_caches
> interface behind some mm debugging option in "Kernel Hacking"? Assuming
> we don't need it otherwise on production kernels. Probably make it
> depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so.
>
> And then also add it to /proc/vmstat, in addition.
That effectively means removing it from the kernel since distros ship
with those config options off. We don't want to do that since there
_are_ valid, occasional uses like benchmarking that we want to be
consistent.
On Wed, Oct 24, 2012 at 6:57 PM, Dave Hansen <[email protected]> wrote:
> On 10/24/2012 03:48 PM, Borislav Petkov wrote:
>> On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote:
>>> Sounds fairly valid to me. But, it's also one that would not be harmed
>>> or disrupted in any way because of a single additional printk() during
>>> each suspend-to-disk operation.
>>
>> back to the drop_caches patch. How about we hide the drop_caches
>> interface behind some mm debugging option in "Kernel Hacking"? Assuming
>> we don't need it otherwise on production kernels. Probably make it
>> depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so.
>>
>> And then also add it to /proc/vmstat, in addition.
>
> That effectively means removing it from the kernel since distros ship
> with those config options off. We don't want to do that since there
> _are_ valid, occasional uses like benchmarking that we want to be
> consistent.
Agreed. we don't want to remove valid interface never.
On Thu, 25 Oct 2012 00:04:46 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
> On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote:
> > On Wed, 24 Oct 2012 23:06:00 +0200
> > Borislav Petkov <[email protected]> wrote:
> >
> > > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> > > > Well who knows. Could be that people's vm *does* suck. Or they have
> > > > some particularly peculiar worklosd or requirement[*]. Or their VM
> > > > *used* to suck, and the drop_caches is not really needed any more but
> > > > it's there in vendor-provided code and they can't practically prevent
> > > > it.
> > >
> > > I have drop_caches in my suspend-to-disk script so that the hibernation
> > > image is kept at minimum and suspend times are as small as possible.
> >
> > hm, that sounds smart.
> >
> > > Would that be a valid use-case?
> >
> > I'd say so, unless we change the kernel to do that internally. We do
> > have the hibernation-specific shrink_all_memory() in the vmscan code.
> > We didn't see fit to document _why_ that exists, but IIRC it's there to
> > create enough free memory for hibernation to be able to successfully
> > complete, but no more.
>
> That's correct.
Well, my point was: how about the idea of reclaiming clean pagecache
(and inodes, dentries, etc) before hibernation so we read/write less
disk data?
Given that it's so easy to do from the hibernation script, I guess
there's not much point...
On Wed, Oct 24, 2012 at 08:56:45PM -0400, KOSAKI Motohiro wrote:
> > That effectively means removing it from the kernel since distros ship
> > with those config options off. We don't want to do that since there
> > _are_ valid, occasional uses like benchmarking that we want to be
> > consistent.
>
> Agreed. we don't want to remove valid interface never.
Ok, duly noted.
But let's discuss this a bit further. So, for the benchmarking aspect,
you're either going to have to always require dmesg along with
benchmarking results or /proc/vmstat, depending on where the drop_caches
stats end up.
Is this how you envision it?
And then there are the VM bug cases, where you might not always get
full dmesg from a panicked system. In that case, you'd want the kernel
tainting thing too, so that it at least appears in the oops backtrace.
Although the tainting thing might not be enough - a user could
drop_caches at some point in time and the oops happening much later
could be unrelated but that can't be expressed in taint flags.
So you'd need some sort of a drop_caches counter, I'd guess. Or a last
drop_caches timestamp something.
Am I understanding the intent correctly?
Thanks.
--
Regards/Gruss,
Boris.
On 10/25/2012 02:24 AM, Borislav Petkov wrote:
> But let's discuss this a bit further. So, for the benchmarking aspect,
> you're either going to have to always require dmesg along with
> benchmarking results or /proc/vmstat, depending on where the drop_caches
> stats end up.
>
> Is this how you envision it?
>
> And then there are the VM bug cases, where you might not always get
> full dmesg from a panicked system. In that case, you'd want the kernel
> tainting thing too, so that it at least appears in the oops backtrace.
>
> Although the tainting thing might not be enough - a user could
> drop_caches at some point in time and the oops happening much later
> could be unrelated but that can't be expressed in taint flags.
Here's the problem: Joe Kernel Developer gets a bug report, usually
something like "the kernel is slow", or "the kernel is eating up all my
memory". We then start going and digging in to the problem with the
usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common,
but less likely, that we get things like vmstat along with such a bug
report.
Joe Kernel Developer digs in the statistics or the dmesg and tries to
figure out what happened. I've run in to a couple of cases in practice
(and I assume Michal has too) where the bug reporter was using
drop_caches _heavily_ and did not realize the implications. It was
quite hard to track down exactly how the page cache and dentries/inodes
were getting purged.
There are rarely oopses involved in these scenarios.
The primary goal of this patch is to make debugging those scenarios
easier so that we can quickly realize that drop_caches is the reason our
caches went away, not some anomalous VM activity. A secondary goal is
to tell the user: "Hey, maybe this isn't something you want to be doing
all the time."
On Wed 24-10-12 12:54:39, Andrew Morton wrote:
> On Wed, 24 Oct 2012 08:29:45 +0200
> Michal Hocko <[email protected]> wrote:
[...]
> hmpf. This patch worries me. If there are people out there who are
> regularly using drop_caches because the VM sucks, it seems pretty
> obnoxious of us to go dumping stuff into their syslog. What are they
> supposed to do? Stop using drop_caches? But that would unfix the
> problem which they fixed with drop_caches in the first case.
>
> And they might not even have control over the code - they need to go
> back to their supplier and say "please send me a new version", along
> with all the additional costs and risks involed in an update.
I understand your worries and that's why I suggested a higher log level
which is under admin's control. Does even that sound too excessive?
> > > More friendly alternatives might be:
> > >
> > > - Taint the kernel. But that will only become apparent with an oops
> > > trace or similar.
> > >
> > > - Add a drop_caches counter and make that available in /proc/vmstat,
> > > show_mem() output and perhaps other places.
> >
> > We would loose timing and originating process name in both cases which
> > can be really helpful while debugging. It is fair to say that we could
> > deduce the timing if we are collecting /proc/meminfo or /proc/vmstat
> > already and we do collect them often but this is not the case all of the
> > time and sometimes it is important to know _who_ is doing all this.
>
> But how important is all that? The main piece of information the
> kernel developer wants is "this guy is using drop_caches a lot". All
> the other info is peripheral and can be gathered by other means if so
> desired.
Well, I have experienced a debugging session where I suspected that an
excessive drop_caches is going on but I had hard time to prove who is
doing that (customer, of course, claimed they are not doing anything
like that) so we went through many loops until we could point the
finger.
--
Michal Hocko
SUSE Labs
On Wed 24-10-12 18:35:43, KOSAKI Motohiro wrote:
> >> I have drop_caches in my suspend-to-disk script so that the hibernation
> >> image is kept at minimum and suspend times are as small as possible.
> >
> > hm, that sounds smart.
> >
> >> Would that be a valid use-case?
> >
> > I'd say so, unless we change the kernel to do that internally. We do
> > have the hibernation-specific shrink_all_memory() in the vmscan code.
> > We didn't see fit to document _why_ that exists, but IIRC it's there to
> > create enough free memory for hibernation to be able to successfully
> > complete, but no more.
>
> shrink_all_memory() drop minimum memory to be needed from hibernation.
> that's trade off matter.
>
> - drop all page cache
> pros.
> speed up hibernation time
> cons.
> after go back from hibernation, system works very slow a while until
> system will get enough file cache.
>
> - drop minimum page cache
> pros.
> system works quickly when go back from hibernation.
> cons.
> relative large hibernation time
>
>
> So, I'm not fun change hibernation default. hmmm... Does adding
> tracepint instead of printk makes sense?
I guess you mean trace_printk. I have seen that one for debugging
purposes only but it seems like it could be used here. CONFIG_TRACING
seems to be enabled on the most distribution kernels.
I am just worried it needs debugfs mounted and my recollection is that
this has some security implications so there might be some pushback on
mounting it on production systems which would defeat the primary
motivation.
Maybe this concern is not that important wrt. excessive logging, though.
I can live with this solution as well if people really hate logging
approach.
--
Michal Hocko
SUSE Labs
On Thu, Oct 25, 2012 at 04:57:11AM -0700, Dave Hansen wrote:
> On 10/25/2012 02:24 AM, Borislav Petkov wrote:
> > But let's discuss this a bit further. So, for the benchmarking aspect,
> > you're either going to have to always require dmesg along with
> > benchmarking results or /proc/vmstat, depending on where the drop_caches
> > stats end up.
> >
> > Is this how you envision it?
> >
> > And then there are the VM bug cases, where you might not always get
> > full dmesg from a panicked system. In that case, you'd want the kernel
> > tainting thing too, so that it at least appears in the oops backtrace.
> >
> > Although the tainting thing might not be enough - a user could
> > drop_caches at some point in time and the oops happening much later
> > could be unrelated but that can't be expressed in taint flags.
>
> Here's the problem: Joe Kernel Developer gets a bug report, usually
> something like "the kernel is slow", or "the kernel is eating up all my
> memory". We then start going and digging in to the problem with the
> usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common,
> but less likely, that we get things like vmstat along with such a bug
> report.
>
> Joe Kernel Developer digs in the statistics or the dmesg and tries to
> figure out what happened. I've run in to a couple of cases in practice
> (and I assume Michal has too) where the bug reporter was using
> drop_caches _heavily_ and did not realize the implications. It was
> quite hard to track down exactly how the page cache and dentries/inodes
> were getting purged.
>
> There are rarely oopses involved in these scenarios.
>
> The primary goal of this patch is to make debugging those scenarios
> easier so that we can quickly realize that drop_caches is the reason our
> caches went away, not some anomalous VM activity. A secondary goal is
> to tell the user: "Hey, maybe this isn't something you want to be doing
> all the time."
Ok, understood. So you will be requiring dmesg, ok, then it makes sense.
This way you're also getting timestamps of when exactly and how many
times drop_caches was used. For that, though, you'll need to add the
timestamp explicitly to the printk because CONFIG_PRINTK_TIME is not
always enabled.
Thanks.
--
Regards/Gruss,
Boris.
On Thu 25-10-12 04:57:11, Dave Hansen wrote:
[...]
> Here's the problem: Joe Kernel Developer gets a bug report, usually
> something like "the kernel is slow", or "the kernel is eating up all my
> memory". We then start going and digging in to the problem with the
> usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common,
> but less likely, that we get things like vmstat along with such a bug
> report.
>
> Joe Kernel Developer digs in the statistics or the dmesg and tries to
> figure out what happened. I've run in to a couple of cases in practice
> (and I assume Michal has too) where the bug reporter was using
> drop_caches _heavily_ and did not realize the implications. It was
> quite hard to track down exactly how the page cache and dentries/inodes
> were getting purged.
Yes, very same here. Not that I would meet issues like that often but it
happened in the past few times and it was always a lot of burnt time.
> There are rarely oopses involved in these scenarios.
>
> The primary goal of this patch is to make debugging those scenarios
> easier so that we can quickly realize that drop_caches is the reason our
> caches went away, not some anomalous VM activity. A secondary goal is
> to tell the user: "Hey, maybe this isn't something you want to be doing
> all the time."
--
Michal Hocko
SUSE Labs
On Wednesday, October 24, 2012 06:17:52 PM Andrew Morton wrote:
> On Thu, 25 Oct 2012 00:04:46 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote:
> > > On Wed, 24 Oct 2012 23:06:00 +0200
> > > Borislav Petkov <[email protected]> wrote:
> > >
> > > > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> > > > > Well who knows. Could be that people's vm *does* suck. Or they have
> > > > > some particularly peculiar worklosd or requirement[*]. Or their VM
> > > > > *used* to suck, and the drop_caches is not really needed any more but
> > > > > it's there in vendor-provided code and they can't practically prevent
> > > > > it.
> > > >
> > > > I have drop_caches in my suspend-to-disk script so that the hibernation
> > > > image is kept at minimum and suspend times are as small as possible.
> > >
> > > hm, that sounds smart.
> > >
> > > > Would that be a valid use-case?
> > >
> > > I'd say so, unless we change the kernel to do that internally. We do
> > > have the hibernation-specific shrink_all_memory() in the vmscan code.
> > > We didn't see fit to document _why_ that exists, but IIRC it's there to
> > > create enough free memory for hibernation to be able to successfully
> > > complete, but no more.
> >
> > That's correct.
>
> Well, my point was: how about the idea of reclaiming clean pagecache
> (and inodes, dentries, etc) before hibernation so we read/write less
> disk data?
We may actually want to write more into the image to improve post-resume
responsiveness.
> Given that it's so easy to do from the hibernation script, I guess
> there's not much point...
Well, I'd say so. :-)
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote:
> Dave Hansen <[email protected]> wrote:
> > What kind of interface _is_ it in the first place? Is it really a
> > production-level thing that we expect users to be poking at? Or, is it
> > a rarely-used debugging and benchmarking knob which is fair game for us
> > to tweak like this?
>
> It was a rarely-used mainly-developer-only thing which, apparently, real
> people found useful at some point in the past. Perhaps we should never
> have offered it.
I've found it useful on occasion when generating large public keys.
When key generation hangs due to not-enough-entropy, dropping all
caches (followed by an intensive read) has allowed the system to
collect enough entropy to let the key generation finish.
Usefulness of the trick is probably going the way of the dodo, thanks to
SSD's becoming more common.
--
Mika Boström Individualisti, eksistentialisti,
http://www.iki.fi/bostik rationalisti ja mulkvisti
GPG: 0x2AED22CC; 6FC9 8375 31B7 3BA2 B5DC 484E F19F 8AD6 2AED 22CC
On Wed, 24 Oct 2012, Andrew Morton wrote:
> > > > I have drop_caches in my suspend-to-disk script so that the hibernation
> > > > image is kept at minimum and suspend times are as small as possible.
> > >
> > > hm, that sounds smart.
> > >
> > > > Would that be a valid use-case?
> > >
> > > I'd say so, unless we change the kernel to do that internally. We do
> > > have the hibernation-specific shrink_all_memory() in the vmscan code.
> > > We didn't see fit to document _why_ that exists, but IIRC it's there to
> > > create enough free memory for hibernation to be able to successfully
> > > complete, but no more.
> >
> > That's correct.
>
> Well, my point was: how about the idea of reclaiming clean pagecache
> (and inodes, dentries, etc) before hibernation so we read/write less
> disk data?
You might or might not want to do that. Dropping caches around suspend
makes the hibernation process itself faster, but the realtime response of
the applications afterwards is worse, as everything touched by user has to
be paged in again.
--
Jiri Kosina
SUSE Labs
On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote:
> You might or might not want to do that. Dropping caches around suspend
> makes the hibernation process itself faster, but the realtime response
> of the applications afterwards is worse, as everything touched by user
> has to be paged in again.
Right, do you know of a real use-case where people hibernate, then
resume and still care about applications response time right afterwards?
Besides, once everything is swapped back in, perf. is back to normal,
i.e. like before suspending.
Thanks.
--
Regards/Gruss,
Boris.
On Mon, 29 Oct 2012, Borislav Petkov wrote:
> > You might or might not want to do that. Dropping caches around suspend
> > makes the hibernation process itself faster, but the realtime response
> > of the applications afterwards is worse, as everything touched by user
> > has to be paged in again.
>
> Right, do you know of a real use-case where people hibernate, then
> resume and still care about applications response time right afterwards?
Well if the point of dropping caches is lowering the resume time, then the
point is rendered moot as soon as you switch to your browser and have to
wait noticeable amount of time until it starts reacting.
--
Jiri Kosina
SUSE Labs
On Mon, Oct 29, 2012 at 11:01:59AM +0100, Jiri Kosina wrote:
> Well if the point of dropping caches is lowering the resume time, then
> the point is rendered moot as soon as you switch to your browser and
> have to wait noticeable amount of time until it starts reacting.
Not the resume time - the suspend time. If, say, one has 8Gb of memory
and Linux nicely spreads all over it in caches, you don't want to wait
too long for the suspend image creation.
And nowadays, since you can have 8Gb in a laptop, you really want to
keep that image minimal so that suspend-to-disk is quick.
The penalty of faulting everything back in is a cost we'd be willing to
pay, I guess.
Thanks.
--
Regards/Gruss,
Boris.
On Mon 2012-10-29 10:58:19, Borislav Petkov wrote:
> On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote:
> > You might or might not want to do that. Dropping caches around suspend
> > makes the hibernation process itself faster, but the realtime response
> > of the applications afterwards is worse, as everything touched by user
> > has to be paged in again.
Also note that page-in is slower than reading hibernation image,
because it is not compressed, and involves seeking.
> Right, do you know of a real use-case where people hibernate, then
> resume and still care about applications response time right afterwards?
Hmm? When I resume from hibernate, I want to use my
machine. *Everyone* cares about resume time afterwards. You move your
mouse, and you don't want to wait for X to be paged-in.
> Besides, once everything is swapped back in, perf. is back to normal,
> i.e. like before suspending.
Kernel will not normally swap anything in automatically. Some people
do swapoff -a; swapon -a to work around that. (And yes, maybe some
automatic-swap-in-when-there's-plenty-of-RAM would be useful.).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Oct 31, 2012 at 06:31:54PM +0100, Pavel Machek wrote:
> Hmm? When I resume from hibernate, I want to use my machine.
Well, in my case with a workstation with 8 Gb, the only time the swapin
is noticeable is when I try to use firefox with a couple of dozens tabs
open. Once that thing is swapped in, system perf is back to normal.
I'll bet that even this slowdown would disappear if I use an SSD.
But I can imagine some workloads where swapping everything back in could
be discomforting.
> Kernel will not normally swap anything in automatically. Some people
> do swapoff -a; swapon -a to work around that. (And yes, maybe some
> automatic-swap-in-when-there's-plenty-of-RAM would be useful.).
That's a good idea, actually.
So, in any case, the current situation is fine as it is, I'd say: people
can decide whether they want to drop caches before suspending or not.
Problem solved.
Thanks.
--
Regards/Gruss,
Boris.
Hi!
> > > hmpf. This patch worries me. If there are people out there who are
> > > regularly using drop_caches because the VM sucks, it seems pretty
> > > obnoxious of us to go dumping stuff into their syslog. What are they
> > > supposed to do? Stop using drop_caches?
> >
> > People use drop_caches because they _think_ the VM sucks, or they
> > _think_ they're "tuning" their system. _They_ are supposed to stop
> > using drop_caches. :)
>
> Well who knows. Could be that people's vm *does* suck. Or they have
> some particularly peculiar worklosd or requirement[*]. Or their VM
> *used* to suck, and the drop_caches is not really needed any more but
> it's there in vendor-provided code and they can't practically prevent
> it.
Or they have ipw wifi that does order 5 allocation :-).
I seen drop_caches used in some android code, as part of SD card handling IIRC.
> > What kind of interface _is_ it in the first place? Is it really a
> > production-level thing that we expect users to be poking at? Or, is it
> > a rarely-used debugging and benchmarking knob which is fair game for us
> > to tweak like this?
>
> It was a rarely-used mainly-developer-only thing which, apparently, real
> people found useful at some point in the past. Perhaps we should never
> have offered it.
And yes, documentation would be good. IIRC you claimed that
drop_caches is not safe to use year-or-so-ago, is that still true?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html