Force a user visible low bound of 5% for the vm.dirty_ratio interface.
Currently global_dirty_limits() applies a low bound of 5% for
vm_dirty_ratio. This is not very user visible -- if the user sets
vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
to 5% when used.
Another problem is inconsistency: calc_period_shift() uses the plain
vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
to < 5 by the user.
CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
kernel/sysctl.c | 3 ++-
mm/page-writeback.c | 10 ++--------
2 files changed, 4 insertions(+), 9 deletions(-)
--- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
+++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
@@ -126,6 +126,7 @@ static int ten_thousand = 10000;
/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+static int dirty_ratio_min = 5;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(vm_dirty_ratio),
.mode = 0644,
.proc_handler = dirty_ratio_handler,
- .extra1 = &zero,
+ .extra1 = &dirty_ratio_min,
.extra2 = &one_hundred,
},
{
--- linux-next.orig/mm/page-writeback.c 2010-08-05 22:48:42.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-08-05 22:48:47.000000000 +0800
@@ -415,14 +415,8 @@ void global_dirty_limits(unsigned long *
if (vm_dirty_bytes)
dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
- else {
- int dirty_ratio;
-
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
- dirty = (dirty_ratio * available_memory) / 100;
- }
+ else
+ dirty = (vm_dirty_ratio * available_memory) / 100;
if (dirty_background_bytes)
background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
On Fri, 06 Aug 2010 00:10:58 +0800
Wu Fengguang <[email protected]> wrote:
> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> Currently global_dirty_limits() applies a low bound of 5% for
> vm_dirty_ratio. This is not very user visible -- if the user sets
> vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> to 5% when used.
>
> Another problem is inconsistency: calc_period_shift() uses the plain
> vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> to < 5 by the user.
The changelog describes the old behaviour but doesn't describe the
proposed new behaviour.
> --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
>
> /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> +static int dirty_ratio_min = 5;
>
> /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> static int maxolduid = 65535;
> @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> .maxlen = sizeof(vm_dirty_ratio),
> .mode = 0644,
> .proc_handler = dirty_ratio_handler,
> - .extra1 = &zero,
> + .extra1 = &dirty_ratio_min,
> .extra2 = &one_hundred,
> },
I forget how the procfs core handles this. Presumably the write will
now fail with -EINVAL or something? So people's scripts will now
error out and their space shuttles will crash?
All of which illustrates why it's important to fully describe changes
in the changelog! So people can consider and discuss the end-user
implications of a change.
On Fri, Aug 06, 2010 at 07:34:01AM +0800, Andrew Morton wrote:
> On Fri, 06 Aug 2010 00:10:58 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > Currently global_dirty_limits() applies a low bound of 5% for
> > vm_dirty_ratio. This is not very user visible -- if the user sets
> > vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> > to 5% when used.
> >
> > Another problem is inconsistency: calc_period_shift() uses the plain
> > vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> > to < 5 by the user.
>
> The changelog describes the old behaviour but doesn't describe the
> proposed new behaviour.
Yeah, fixed below.
> > --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> > +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> > @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
> >
> > /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> > static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> > +static int dirty_ratio_min = 5;
> >
> > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> > static int maxolduid = 65535;
> > @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> > .maxlen = sizeof(vm_dirty_ratio),
> > .mode = 0644,
> > .proc_handler = dirty_ratio_handler,
> > - .extra1 = &zero,
> > + .extra1 = &dirty_ratio_min,
> > .extra2 = &one_hundred,
> > },
>
> I forget how the procfs core handles this. Presumably the write will
> now fail with -EINVAL or something?
Right.
# echo 111 > /proc/sys/vm/dirty_ratio
echo: write error: invalid argument
> So people's scripts will now error out and their space shuttles will
> crash?
Looks like a serious problem. I'm now much more reserved on pushing
this patch :)
> All of which illustrates why it's important to fully describe changes
> in the changelog! So people can consider and discuss the end-user
> implications of a change.
Good point. Here is the patch with updated changelog.
Thanks,
Fengguang
---
Subject: writeback: explicit low bound for vm.dirty_ratio
From: Wu Fengguang <[email protected]>
Date: Thu Jul 15 10:28:57 CST 2010
Force a user visible low bound of 5% for the vm.dirty_ratio interface.
This is an interface change. When doing
echo N > /proc/sys/vm/dirty_ratio
where N < 5, the old behavior is pretend to accept the value, while
the new behavior is to reject it explicitly with -EINVAL. This will
possibly break user space if they checks the return value.
Currently global_dirty_limits() applies a low bound of 5% for
vm_dirty_ratio. This is not very user visible -- if the user sets
vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
to 5% when used.
Another problem is inconsistency: calc_period_shift() uses the plain
vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
to < 5 by the user.
CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
kernel/sysctl.c | 3 ++-
mm/page-writeback.c | 10 ++--------
2 files changed, 4 insertions(+), 9 deletions(-)
--- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
+++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
@@ -126,6 +126,7 @@ static int ten_thousand = 10000;
/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+static int dirty_ratio_min = 5;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(vm_dirty_ratio),
.mode = 0644,
.proc_handler = dirty_ratio_handler,
- .extra1 = &zero,
+ .extra1 = &dirty_ratio_min,
.extra2 = &one_hundred,
},
{
--- linux-next.orig/mm/page-writeback.c 2010-08-05 22:48:42.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-08-05 22:48:47.000000000 +0800
@@ -415,14 +415,8 @@ void global_dirty_limits(unsigned long *
if (vm_dirty_bytes)
dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
- else {
- int dirty_ratio;
-
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
- dirty = (dirty_ratio * available_memory) / 100;
- }
+ else
+ dirty = (vm_dirty_ratio * available_memory) / 100;
if (dirty_background_bytes)
background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);
> Subject: writeback: explicit low bound for vm.dirty_ratio
> From: Wu Fengguang <[email protected]>
> Date: Thu Jul 15 10:28:57 CST 2010
>
> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> This is an interface change. When doing
>
> echo N > /proc/sys/vm/dirty_ratio
>
> where N < 5, the old behavior is pretend to accept the value, while
> the new behavior is to reject it explicitly with -EINVAL. This will
> possibly break user space if they checks the return value.
Umm.. I dislike this change. Is there any good reason to refuse explicit
admin's will? Why 1-4% is so bad? Internal clipping can be changed later
but explicit error behavior is hard to change later.
personally I prefer to
- accept all value, or
- clipping value in dirty_ratio_handler
Both don't have explicit ABI change.
Thanks.
On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> > Subject: writeback: explicit low bound for vm.dirty_ratio
> > From: Wu Fengguang <[email protected]>
> > Date: Thu Jul 15 10:28:57 CST 2010
> >
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > This is an interface change. When doing
> >
> > echo N > /proc/sys/vm/dirty_ratio
> >
> > where N < 5, the old behavior is pretend to accept the value, while
> > the new behavior is to reject it explicitly with -EINVAL. This will
> > possibly break user space if they checks the return value.
>
> Umm.. I dislike this change. Is there any good reason to refuse explicit
> admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> but explicit error behavior is hard to change later.
As a data-point, I had a situation a while back where I needed a value below
1 to get desired behaviour. The system had lots of RAM and fairly slow
write-back (over NFS) so a 'sync' could take minutes.
So I would much prefer allowing not only 1-4, but also fraction values!!!
I can see no justification at all for setting a lower bound of 5. Even zero
can be useful - for testing purposes mostly.
NeilBrown
> personally I prefer to
> - accept all value, or
> - clipping value in dirty_ratio_handler
>
> Both don't have explicit ABI change.
>
> Thanks.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue 10-08-10 13:57:12, Neil Brown wrote:
> On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Subject: writeback: explicit low bound for vm.dirty_ratio
> > > From: Wu Fengguang <[email protected]>
> > > Date: Thu Jul 15 10:28:57 CST 2010
> > >
> > > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> > >
> > > This is an interface change. When doing
> > >
> > > echo N > /proc/sys/vm/dirty_ratio
> > >
> > > where N < 5, the old behavior is pretend to accept the value, while
> > > the new behavior is to reject it explicitly with -EINVAL. This will
> > > possibly break user space if they checks the return value.
> >
> > Umm.. I dislike this change. Is there any good reason to refuse explicit
> > admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> > but explicit error behavior is hard to change later.
>
> As a data-point, I had a situation a while back where I needed a value below
> 1 to get desired behaviour. The system had lots of RAM and fairly slow
> write-back (over NFS) so a 'sync' could take minutes.
>
> So I would much prefer allowing not only 1-4, but also fraction values!!!
>
> I can see no justification at all for setting a lower bound of 5. Even zero
> can be useful - for testing purposes mostly.
If you run on a recent kernel, /proc/sys/vm/dirty_background_bytes and
dirty_bytes is what was introduced exactly for these purposes. Not that I
would think that magic clipping at 5% is a good thing...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue, Aug 10, 2010 at 11:12:06AM +0800, KOSAKI Motohiro wrote:
> > Subject: writeback: explicit low bound for vm.dirty_ratio
> > From: Wu Fengguang <[email protected]>
> > Date: Thu Jul 15 10:28:57 CST 2010
> >
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > This is an interface change. When doing
> >
> > echo N > /proc/sys/vm/dirty_ratio
> >
> > where N < 5, the old behavior is pretend to accept the value, while
> > the new behavior is to reject it explicitly with -EINVAL. This will
> > possibly break user space if they checks the return value.
>
> Umm.. I dislike this change. Is there any good reason to refuse explicit
> admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> but explicit error behavior is hard to change later.
>
> personally I prefer to
> - accept all value, or
> - clipping value in dirty_ratio_handler
>
> Both don't have explicit ABI change.
Good point. Sorry for being ignorance. Neil is right that there is no
reason to impose some low bound. So the first option looks good.
Thanks,
Fengguang
On Tue, Aug 10, 2010 at 11:57:12AM +0800, Neil Brown wrote:
> On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Subject: writeback: explicit low bound for vm.dirty_ratio
> > > From: Wu Fengguang <[email protected]>
> > > Date: Thu Jul 15 10:28:57 CST 2010
> > >
> > > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> > >
> > > This is an interface change. When doing
> > >
> > > echo N > /proc/sys/vm/dirty_ratio
> > >
> > > where N < 5, the old behavior is pretend to accept the value, while
> > > the new behavior is to reject it explicitly with -EINVAL. This will
> > > possibly break user space if they checks the return value.
> >
> > Umm.. I dislike this change. Is there any good reason to refuse explicit
> > admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> > but explicit error behavior is hard to change later.
>
> As a data-point, I had a situation a while back where I needed a value below
> 1 to get desired behaviour. The system had lots of RAM and fairly slow
> write-back (over NFS) so a 'sync' could take minutes.
Jan, here is a use case to limit dirty pages on slow devices :)
> So I would much prefer allowing not only 1-4, but also fraction values!!!
>
> I can see no justification at all for setting a lower bound of 5. Even zero
> can be useful - for testing purposes mostly.
Neil, that's perfectly legitimate need which I overlooked.
It seems that the vm.dirty_bytes parameter will work for your case.
Thanks,
Fengguang