2010-08-05 16:33:02

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

Force a user visible low bound of 5% for the vm.dirty_ratio interface.

Currently global_dirty_limits() applies a low bound of 5% for
vm_dirty_ratio. This is not very user visible -- if the user sets
vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
to 5% when used.

Another problem is inconsistency: calc_period_shift() uses the plain
vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
to < 5 by the user.

CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
kernel/sysctl.c | 3 ++-
mm/page-writeback.c | 10 ++--------
2 files changed, 4 insertions(+), 9 deletions(-)

--- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
+++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
@@ -126,6 +126,7 @@ static int ten_thousand = 10000;

/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+static int dirty_ratio_min = 5;

/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(vm_dirty_ratio),
.mode = 0644,
.proc_handler = dirty_ratio_handler,
- .extra1 = &zero,
+ .extra1 = &dirty_ratio_min,
.extra2 = &one_hundred,
},
{
--- linux-next.orig/mm/page-writeback.c 2010-08-05 22:48:42.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-08-05 22:48:47.000000000 +0800
@@ -415,14 +415,8 @@ void global_dirty_limits(unsigned long *

if (vm_dirty_bytes)
dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
- else {
- int dirty_ratio;
-
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
- dirty = (dirty_ratio * available_memory) / 100;
- }
+ else
+ dirty = (vm_dirty_ratio * available_memory) / 100;

if (dirty_background_bytes)
background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);


2010-08-05 23:35:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Fri, 06 Aug 2010 00:10:58 +0800
Wu Fengguang <[email protected]> wrote:

> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> Currently global_dirty_limits() applies a low bound of 5% for
> vm_dirty_ratio. This is not very user visible -- if the user sets
> vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> to 5% when used.
>
> Another problem is inconsistency: calc_period_shift() uses the plain
> vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> to < 5 by the user.

The changelog describes the old behaviour but doesn't describe the
proposed new behaviour.

> --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
>
> /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> +static int dirty_ratio_min = 5;
>
> /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> static int maxolduid = 65535;
> @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> .maxlen = sizeof(vm_dirty_ratio),
> .mode = 0644,
> .proc_handler = dirty_ratio_handler,
> - .extra1 = &zero,
> + .extra1 = &dirty_ratio_min,
> .extra2 = &one_hundred,
> },

I forget how the procfs core handles this. Presumably the write will
now fail with -EINVAL or something? So people's scripts will now
error out and their space shuttles will crash?

All of which illustrates why it's important to fully describe changes
in the changelog! So people can consider and discuss the end-user
implications of a change.

2010-08-07 16:21:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Fri, Aug 06, 2010 at 07:34:01AM +0800, Andrew Morton wrote:
> On Fri, 06 Aug 2010 00:10:58 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > Currently global_dirty_limits() applies a low bound of 5% for
> > vm_dirty_ratio. This is not very user visible -- if the user sets
> > vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
> > to 5% when used.
> >
> > Another problem is inconsistency: calc_period_shift() uses the plain
> > vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
> > to < 5 by the user.
>
> The changelog describes the old behaviour but doesn't describe the
> proposed new behaviour.

Yeah, fixed below.

> > --- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
> > +++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
> > @@ -126,6 +126,7 @@ static int ten_thousand = 10000;
> >
> > /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
> > static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
> > +static int dirty_ratio_min = 5;
> >
> > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> > static int maxolduid = 65535;
> > @@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
> > .maxlen = sizeof(vm_dirty_ratio),
> > .mode = 0644,
> > .proc_handler = dirty_ratio_handler,
> > - .extra1 = &zero,
> > + .extra1 = &dirty_ratio_min,
> > .extra2 = &one_hundred,
> > },
>
> I forget how the procfs core handles this. Presumably the write will
> now fail with -EINVAL or something?

Right.
# echo 111 > /proc/sys/vm/dirty_ratio
echo: write error: invalid argument

> So people's scripts will now error out and their space shuttles will
> crash?

Looks like a serious problem. I'm now much more reserved on pushing
this patch :)

> All of which illustrates why it's important to fully describe changes
> in the changelog! So people can consider and discuss the end-user
> implications of a change.

Good point. Here is the patch with updated changelog.

Thanks,
Fengguang
---
Subject: writeback: explicit low bound for vm.dirty_ratio
From: Wu Fengguang <[email protected]>
Date: Thu Jul 15 10:28:57 CST 2010

Force a user visible low bound of 5% for the vm.dirty_ratio interface.

This is an interface change. When doing

echo N > /proc/sys/vm/dirty_ratio

where N < 5, the old behavior is pretend to accept the value, while
the new behavior is to reject it explicitly with -EINVAL. This will
possibly break user space if they checks the return value.

Currently global_dirty_limits() applies a low bound of 5% for
vm_dirty_ratio. This is not very user visible -- if the user sets
vm.dirty_ratio=1, the operation seems to succeed but will be rounded up
to 5% when used.

Another problem is inconsistency: calc_period_shift() uses the plain
vm_dirty_ratio value, which may be a problem when vm.dirty_ratio is set
to < 5 by the user.

CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
kernel/sysctl.c | 3 ++-
mm/page-writeback.c | 10 ++--------
2 files changed, 4 insertions(+), 9 deletions(-)

--- linux-next.orig/kernel/sysctl.c 2010-08-05 22:48:34.000000000 +0800
+++ linux-next/kernel/sysctl.c 2010-08-05 22:48:47.000000000 +0800
@@ -126,6 +126,7 @@ static int ten_thousand = 10000;

/* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
+static int dirty_ratio_min = 5;

/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -1031,7 +1032,7 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(vm_dirty_ratio),
.mode = 0644,
.proc_handler = dirty_ratio_handler,
- .extra1 = &zero,
+ .extra1 = &dirty_ratio_min,
.extra2 = &one_hundred,
},
{
--- linux-next.orig/mm/page-writeback.c 2010-08-05 22:48:42.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-08-05 22:48:47.000000000 +0800
@@ -415,14 +415,8 @@ void global_dirty_limits(unsigned long *

if (vm_dirty_bytes)
dirty = DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE);
- else {
- int dirty_ratio;
-
- dirty_ratio = vm_dirty_ratio;
- if (dirty_ratio < 5)
- dirty_ratio = 5;
- dirty = (dirty_ratio * available_memory) / 100;
- }
+ else
+ dirty = (vm_dirty_ratio * available_memory) / 100;

if (dirty_background_bytes)
background = DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE);

2010-08-10 03:13:58

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

> Subject: writeback: explicit low bound for vm.dirty_ratio
> From: Wu Fengguang <[email protected]>
> Date: Thu Jul 15 10:28:57 CST 2010
>
> Force a user visible low bound of 5% for the vm.dirty_ratio interface.
>
> This is an interface change. When doing
>
> echo N > /proc/sys/vm/dirty_ratio
>
> where N < 5, the old behavior is pretend to accept the value, while
> the new behavior is to reject it explicitly with -EINVAL. This will
> possibly break user space if they checks the return value.

Umm.. I dislike this change. Is there any good reason to refuse explicit
admin's will? Why 1-4% is so bad? Internal clipping can be changed later
but explicit error behavior is hard to change later.

personally I prefer to
- accept all value, or
- clipping value in dirty_ratio_handler

Both don't have explicit ABI change.

Thanks.


2010-08-10 03:57:36

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > Subject: writeback: explicit low bound for vm.dirty_ratio
> > From: Wu Fengguang <[email protected]>
> > Date: Thu Jul 15 10:28:57 CST 2010
> >
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > This is an interface change. When doing
> >
> > echo N > /proc/sys/vm/dirty_ratio
> >
> > where N < 5, the old behavior is pretend to accept the value, while
> > the new behavior is to reject it explicitly with -EINVAL. This will
> > possibly break user space if they checks the return value.
>
> Umm.. I dislike this change. Is there any good reason to refuse explicit
> admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> but explicit error behavior is hard to change later.

As a data-point, I had a situation a while back where I needed a value below
1 to get desired behaviour. The system had lots of RAM and fairly slow
write-back (over NFS) so a 'sync' could take minutes.

So I would much prefer allowing not only 1-4, but also fraction values!!!

I can see no justification at all for setting a lower bound of 5. Even zero
can be useful - for testing purposes mostly.

NeilBrown

> personally I prefer to
> - accept all value, or
> - clipping value in dirty_ratio_handler
>
> Both don't have explicit ABI change.
>
> Thanks.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-08-10 13:30:15

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Tue 10-08-10 13:57:12, Neil Brown wrote:
> On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Subject: writeback: explicit low bound for vm.dirty_ratio
> > > From: Wu Fengguang <[email protected]>
> > > Date: Thu Jul 15 10:28:57 CST 2010
> > >
> > > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> > >
> > > This is an interface change. When doing
> > >
> > > echo N > /proc/sys/vm/dirty_ratio
> > >
> > > where N < 5, the old behavior is pretend to accept the value, while
> > > the new behavior is to reject it explicitly with -EINVAL. This will
> > > possibly break user space if they checks the return value.
> >
> > Umm.. I dislike this change. Is there any good reason to refuse explicit
> > admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> > but explicit error behavior is hard to change later.
>
> As a data-point, I had a situation a while back where I needed a value below
> 1 to get desired behaviour. The system had lots of RAM and fairly slow
> write-back (over NFS) so a 'sync' could take minutes.
>
> So I would much prefer allowing not only 1-4, but also fraction values!!!
>
> I can see no justification at all for setting a lower bound of 5. Even zero
> can be useful - for testing purposes mostly.
If you run on a recent kernel, /proc/sys/vm/dirty_background_bytes and
dirty_bytes is what was introduced exactly for these purposes. Not that I
would think that magic clipping at 5% is a good thing...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2010-08-10 18:11:47

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Tue, Aug 10, 2010 at 11:12:06AM +0800, KOSAKI Motohiro wrote:
> > Subject: writeback: explicit low bound for vm.dirty_ratio
> > From: Wu Fengguang <[email protected]>
> > Date: Thu Jul 15 10:28:57 CST 2010
> >
> > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> >
> > This is an interface change. When doing
> >
> > echo N > /proc/sys/vm/dirty_ratio
> >
> > where N < 5, the old behavior is pretend to accept the value, while
> > the new behavior is to reject it explicitly with -EINVAL. This will
> > possibly break user space if they checks the return value.
>
> Umm.. I dislike this change. Is there any good reason to refuse explicit
> admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> but explicit error behavior is hard to change later.
>
> personally I prefer to
> - accept all value, or
> - clipping value in dirty_ratio_handler
>
> Both don't have explicit ABI change.

Good point. Sorry for being ignorance. Neil is right that there is no
reason to impose some low bound. So the first option looks good.

Thanks,
Fengguang

2010-08-10 18:12:30

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 07/13] writeback: explicit low bound for vm.dirty_ratio

On Tue, Aug 10, 2010 at 11:57:12AM +0800, Neil Brown wrote:
> On Tue, 10 Aug 2010 12:12:06 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Subject: writeback: explicit low bound for vm.dirty_ratio
> > > From: Wu Fengguang <[email protected]>
> > > Date: Thu Jul 15 10:28:57 CST 2010
> > >
> > > Force a user visible low bound of 5% for the vm.dirty_ratio interface.
> > >
> > > This is an interface change. When doing
> > >
> > > echo N > /proc/sys/vm/dirty_ratio
> > >
> > > where N < 5, the old behavior is pretend to accept the value, while
> > > the new behavior is to reject it explicitly with -EINVAL. This will
> > > possibly break user space if they checks the return value.
> >
> > Umm.. I dislike this change. Is there any good reason to refuse explicit
> > admin's will? Why 1-4% is so bad? Internal clipping can be changed later
> > but explicit error behavior is hard to change later.
>
> As a data-point, I had a situation a while back where I needed a value below
> 1 to get desired behaviour. The system had lots of RAM and fairly slow
> write-back (over NFS) so a 'sync' could take minutes.

Jan, here is a use case to limit dirty pages on slow devices :)

> So I would much prefer allowing not only 1-4, but also fraction values!!!
>
> I can see no justification at all for setting a lower bound of 5. Even zero
> can be useful - for testing purposes mostly.

Neil, that's perfectly legitimate need which I overlooked.
It seems that the vm.dirty_bytes parameter will work for your case.

Thanks,
Fengguang