2017-09-18 15:07:00

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v2] mm: introduce validity check on vm dirtiness settings

we can find the logic in domain_dirty_limits() that
when dirty bg_thresh is bigger than dirty thresh,
bg_thresh will be set as thresh * 1 / 2.
if (bg_thresh >= thresh)
bg_thresh = thresh / 2;

But actually we can set vm background dirtiness bigger than
vm dirtiness successfully. This behavior may mislead us.
We'd better do this validity check at the beginning.

Signed-off-by: Yafang Shao <[email protected]>
---
Documentation/sysctl/vm.txt | 5 +++
mm/page-writeback.c | 86 ++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a..5de02f6 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -156,6 +156,8 @@ read.
Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
value lower than this limit will be ignored and the old configuration will be
retained.
+dirty_bytes can't less than dirty_background_bytes or
+available_memory / 100 * dirty_ratio.

==============================================================

@@ -176,6 +178,9 @@ generating disk writes will itself start writing out dirty data.

The total available memory is not equal to total system memory.

+Note: dirty_ratio can't less than dirty_background_ratio or
+dirty_background_bytes / available_memory * 100.
+
==============================================================

dirty_writeback_centisecs
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0b9c5cb..4a2d229 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -511,15 +511,68 @@ bool node_dirty_ok(struct pglist_data *pgdat)
return nr_pages <= limit;
}

+static bool vm_dirty_settings_valid(void)
+{
+ bool ret = true;
+ unsigned long bytes;
+
+ if (vm_dirty_ratio > 0) {
+ if (dirty_background_ratio >= vm_dirty_ratio) {
+ ret = false;
+ goto out;
+ }
+
+ bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
+ vm_dirty_ratio;
+ if (dirty_background_bytes >= bytes) {
+ ret = false;
+ goto out;
+ }
+ }
+
+ if (vm_dirty_bytes > 0) {
+ if (dirty_background_bytes >= vm_dirty_bytes) {
+ ret = false;
+ goto out;
+ }
+
+ bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
+ dirty_background_ratio;
+
+ if (bytes >= vm_dirty_bytes) {
+ ret = false;
+ goto out;
+ }
+ }
+
+ if (vm_dirty_bytes == 0 && vm_dirty_ratio == 0 &&
+ (dirty_background_bytes != 0 || dirty_background_ratio != 0))
+ ret = false;
+
+out:
+ if (!ret)
+ pr_err("vm dirtiness can't less than vm background dirtiness\n");
+
+ return ret;
+}
+
int dirty_background_ratio_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
int ret;
+ int old_ratio = dirty_background_ratio;

ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
- dirty_background_bytes = 0;
+ if (ret == 0 && write) {
+ if (dirty_background_ratio != old_ratio &&
+ !vm_dirty_settings_valid()) {
+ dirty_background_ratio = old_ratio;
+ ret = -EINVAL;
+ } else
+ dirty_background_bytes = 0;
+ }
+
return ret;
}

@@ -528,10 +581,17 @@ int dirty_background_bytes_handler(struct ctl_table *table, int write,
loff_t *ppos)
{
int ret;
+ unsigned long old_bytes = dirty_background_bytes;

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
- if (ret == 0 && write)
- dirty_background_ratio = 0;
+ if (ret == 0 && write) {
+ if (dirty_background_bytes != old_bytes &&
+ !vm_dirty_settings_valid()) {
+ dirty_background_bytes = old_bytes;
+ ret = -EINVAL;
+ } else
+ dirty_background_ratio = 0;
+ }
return ret;
}

@@ -544,8 +604,13 @@ int dirty_ratio_handler(struct ctl_table *table, int write,

ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
- writeback_set_ratelimit();
- vm_dirty_bytes = 0;
+ if (vm_dirty_settings_valid()) {
+ writeback_set_ratelimit();
+ vm_dirty_bytes = 0;
+ } else {
+ vm_dirty_ratio = old_ratio;
+ ret = -EINVAL;
+ }
}
return ret;
}
@@ -559,8 +624,13 @@ int dirty_bytes_handler(struct ctl_table *table, int write,

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
- writeback_set_ratelimit();
- vm_dirty_ratio = 0;
+ if (vm_dirty_settings_valid()) {
+ writeback_set_ratelimit();
+ vm_dirty_ratio = 0;
+ } else {
+ vm_dirty_bytes = old_bytes;
+ ret = -EINVAL;
+ }
}
return ret;
}
--
1.8.3.1


2017-09-19 08:35:59

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2] mm: introduce validity check on vm dirtiness settings

On Tue 19-09-17 06:53:00, Yafang Shao wrote:
> we can find the logic in domain_dirty_limits() that
> when dirty bg_thresh is bigger than dirty thresh,
> bg_thresh will be set as thresh * 1 / 2.
> if (bg_thresh >= thresh)
> bg_thresh = thresh / 2;
>
> But actually we can set vm background dirtiness bigger than
> vm dirtiness successfully. This behavior may mislead us.
> We'd better do this validity check at the beginning.
>
> Signed-off-by: Yafang Shao <[email protected]>

The patch looks mostly good now. Just some small comments below.

> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 9baf66a..5de02f6 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -156,6 +156,8 @@ read.
> Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
> value lower than this limit will be ignored and the old configuration will be
> retained.
> +dirty_bytes can't less than dirty_background_bytes or
> +available_memory / 100 * dirty_ratio.

I would phrase this like:

Note: the value of dirty_bytes also cannot be set lower than
dirty_background_bytes or the amount of memory corresponding to
dirty_background_ratio.

> ==============================================================
>
> @@ -176,6 +178,9 @@ generating disk writes will itself start writing out dirty data.
>
> The total available memory is not equal to total system memory.
>
> +Note: dirty_ratio can't less than dirty_background_ratio or
> +dirty_background_bytes / available_memory * 100.
> +

And similarly here:

Note: dirty_ratio cannot be set lower than dirty_background_ratio or
ratio corresponding to dirty_background_bytes.


> @@ -511,15 +511,68 @@ bool node_dirty_ok(struct pglist_data *pgdat)
> return nr_pages <= limit;
> }
>
> +static bool vm_dirty_settings_valid(void)
> +{
> + bool ret = true;
> + unsigned long bytes;
> +
> + if (vm_dirty_ratio > 0) {
> + if (dirty_background_ratio >= vm_dirty_ratio) {
> + ret = false;
> + goto out;
> + }
> +
> + bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
> + vm_dirty_ratio;
> + if (dirty_background_bytes >= bytes) {
> + ret = false;
> + goto out;
> + }
> + }
> +
> + if (vm_dirty_bytes > 0) {
> + if (dirty_background_bytes >= vm_dirty_bytes) {
> + ret = false;
> + goto out;
> + }
> +
> + bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
> + dirty_background_ratio;
> +
> + if (bytes >= vm_dirty_bytes) {
> + ret = false;
> + goto out;
> + }
> + }
> +
> + if (vm_dirty_bytes == 0 && vm_dirty_ratio == 0 &&
> + (dirty_background_bytes != 0 || dirty_background_ratio != 0))
> + ret = false;

Hum, why not just:
if ((vm_dirty_bytes == 0 && vm_dirty_ratio) ||
(dirty_background_bytes == 0 && dirty_background_ratio == 0))
ret = false;

IMHO setting either tunable to 0 is just wrong and actively dangerous...

> +out:
> + if (!ret)
> + pr_err("vm dirtiness can't less than vm background dirtiness\n");

I would refrain from spamming logs with the error message. In my opinion it
is not needed.

> int dirty_background_ratio_handler(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp,
> loff_t *ppos)
> {
> int ret;
> + int old_ratio = dirty_background_ratio;
>
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> - if (ret == 0 && write)
> - dirty_background_bytes = 0;
> + if (ret == 0 && write) {
> + if (dirty_background_ratio != old_ratio &&
> + !vm_dirty_settings_valid()) {

Why do you check whether new ratio is different here? If it is really
needed, it would deserve a comment.

> + dirty_background_ratio = old_ratio;
> + ret = -EINVAL;
> + } else
> + dirty_background_bytes = 0;
> + }
> +
> return ret;
> }
>
> @@ -528,10 +581,17 @@ int dirty_background_bytes_handler(struct ctl_table *table, int write,
> loff_t *ppos)
> {
> int ret;
> + unsigned long old_bytes = dirty_background_bytes;
>
> ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
> - if (ret == 0 && write)
> - dirty_background_ratio = 0;
> + if (ret == 0 && write) {
> + if (dirty_background_bytes != old_bytes &&
> + !vm_dirty_settings_valid()) {

The same here...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-19 11:48:04

by Yafang Shao

[permalink] [raw]
Subject: Re: [PATCH v2] mm: introduce validity check on vm dirtiness settings

2017-09-19 16:35 GMT+08:00 Jan Kara <[email protected]>:
> On Tue 19-09-17 06:53:00, Yafang Shao wrote:
>> we can find the logic in domain_dirty_limits() that
>> when dirty bg_thresh is bigger than dirty thresh,
>> bg_thresh will be set as thresh * 1 / 2.
>> if (bg_thresh >= thresh)
>> bg_thresh = thresh / 2;
>>
>> But actually we can set vm background dirtiness bigger than
>> vm dirtiness successfully. This behavior may mislead us.
>> We'd better do this validity check at the beginning.
>>
>> Signed-off-by: Yafang Shao <[email protected]>
>
> The patch looks mostly good now. Just some small comments below.
>
>> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
>> index 9baf66a..5de02f6 100644
>> --- a/Documentation/sysctl/vm.txt
>> +++ b/Documentation/sysctl/vm.txt
>> @@ -156,6 +156,8 @@ read.
>> Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
>> value lower than this limit will be ignored and the old configuration will be
>> retained.
>> +dirty_bytes can't less than dirty_background_bytes or
>> +available_memory / 100 * dirty_ratio.
>
> I would phrase this like:
>
> Note: the value of dirty_bytes also cannot be set lower than
> dirty_background_bytes or the amount of memory corresponding to
> dirty_background_ratio.
>

Thanks :)

>> ==============================================================
>>
>> @@ -176,6 +178,9 @@ generating disk writes will itself start writing out dirty data.
>>
>> The total available memory is not equal to total system memory.
>>
>> +Note: dirty_ratio can't less than dirty_background_ratio or
>> +dirty_background_bytes / available_memory * 100.
>> +
>
> And similarly here:
>
> Note: dirty_ratio cannot be set lower than dirty_background_ratio or
> ratio corresponding to dirty_background_bytes.
>
Thanks :)

>
>> @@ -511,15 +511,68 @@ bool node_dirty_ok(struct pglist_data *pgdat)
>> return nr_pages <= limit;
>> }
>>
>> +static bool vm_dirty_settings_valid(void)
>> +{
>> + bool ret = true;
>> + unsigned long bytes;
>> +
>> + if (vm_dirty_ratio > 0) {
>> + if (dirty_background_ratio >= vm_dirty_ratio) {
>> + ret = false;
>> + goto out;
>> + }
>> +
>> + bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
>> + vm_dirty_ratio;
>> + if (dirty_background_bytes >= bytes) {
>> + ret = false;
>> + goto out;
>> + }
>> + }
>> +
>> + if (vm_dirty_bytes > 0) {
>> + if (dirty_background_bytes >= vm_dirty_bytes) {
>> + ret = false;
>> + goto out;
>> + }
>> +
>> + bytes = global_dirtyable_memory() * PAGE_SIZE / 100 *
>> + dirty_background_ratio;
>> +
>> + if (bytes >= vm_dirty_bytes) {
>> + ret = false;
>> + goto out;
>> + }
>> + }
>> +
>> + if (vm_dirty_bytes == 0 && vm_dirty_ratio == 0 &&
>> + (dirty_background_bytes != 0 || dirty_background_ratio != 0))
>> + ret = false;
>
> Hum, why not just:
> if ((vm_dirty_bytes == 0 && vm_dirty_ratio) ||
> (dirty_background_bytes == 0 && dirty_background_ratio == 0))
> ret = false;
>
> IMHO setting either tunable to 0 is just wrong and actively dangerous...
>

Because these four variables all could be set to 0 before, and I'm not
sure if this
is needed under some certain conditions, although I think this is
dangerous but I have
to keep it as before.

If you think that is wrong, then I will modified it as you suggested.

>> +out:
>> + if (!ret)
>> + pr_err("vm dirtiness can't less than vm background dirtiness\n");
>
> I would refrain from spamming logs with the error message. In my opinion it
> is not needed.
>
>> int dirty_background_ratio_handler(struct ctl_table *table, int write,
>> void __user *buffer, size_t *lenp,
>> loff_t *ppos)
>> {
>> int ret;
>> + int old_ratio = dirty_background_ratio;
>>
>> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> - if (ret == 0 && write)
>> - dirty_background_bytes = 0;
>> + if (ret == 0 && write) {
>> + if (dirty_background_ratio != old_ratio &&
>> + !vm_dirty_settings_valid()) {
>
> Why do you check whether new ratio is different here? If it is really
> needed, it would deserve a comment.
>

There're two reseaons,
1. if you set a value same with the old value, it's needn't to do this check.
2. there's another behavior that I'm not sure whether it is reaonable. i.e.
if the old value is,
vm.dirty_background_bytes = 0;
vm.dirty_background_ratio=10;
then I execute the bellow command,
sysctl -w vm.dirty_background_bytes=0
at the end these two values will be,
vm.dirty_background_bytes = 0;
vm.dirty_background_ratio=0;
I'm not sure if this is needed under some certain conditons, So I have
to keep it as before.




>> + dirty_background_ratio = old_ratio;
>> + ret = -EINVAL;
>> + } else
>> + dirty_background_bytes = 0;
>> + }
>> +
>> return ret;
>> }
>>
>> @@ -528,10 +581,17 @@ int dirty_background_bytes_handler(struct ctl_table *table, int write,
>> loff_t *ppos)
>> {
>> int ret;
>> + unsigned long old_bytes = dirty_background_bytes;
>>
>> ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
>> - if (ret == 0 && write)
>> - dirty_background_ratio = 0;
>> + if (ret == 0 && write) {
>> + if (dirty_background_bytes != old_bytes &&
>> + !vm_dirty_settings_valid()) {
>
> The same here...
>



> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

Thanks
Yafang

2017-09-20 15:33:31

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2] mm: introduce validity check on vm dirtiness settings

On Tue 19-09-17 19:48:00, Yafang Shao wrote:
> 2017-09-19 16:35 GMT+08:00 Jan Kara <[email protected]>:
> > On Tue 19-09-17 06:53:00, Yafang Shao wrote:
> >> + if (vm_dirty_bytes == 0 && vm_dirty_ratio == 0 &&
> >> + (dirty_background_bytes != 0 || dirty_background_ratio != 0))
> >> + ret = false;
> >
> > Hum, why not just:
> > if ((vm_dirty_bytes == 0 && vm_dirty_ratio == 0) ||
> > (dirty_background_bytes == 0 && dirty_background_ratio == 0))
> > ret = false;
> >
> > IMHO setting either tunable to 0 is just wrong and actively dangerous...
> >
>
> Because these four variables all could be set to 0 before, and I'm not
> sure if this
> is needed under some certain conditions, although I think this is
> dangerous but I have
> to keep it as before.
>
> If you think that is wrong, then I will modified it as you suggested.

OK, I see but see below.

> >> int dirty_background_ratio_handler(struct ctl_table *table, int write,
> >> void __user *buffer, size_t *lenp,
> >> loff_t *ppos)
> >> {
> >> int ret;
> >> + int old_ratio = dirty_background_ratio;
> >>
> >> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >> - if (ret == 0 && write)
> >> - dirty_background_bytes = 0;
> >> + if (ret == 0 && write) {
> >> + if (dirty_background_ratio != old_ratio &&
> >> + !vm_dirty_settings_valid()) {
> >
> > Why do you check whether new ratio is different here? If it is really
> > needed, it would deserve a comment.
> >
>
> There're two reseaons,
> 1. if you set a value same with the old value, it's needn't to do this check.
> 2. there's another behavior that I'm not sure whether it is reaonable. i.e.
> if the old value is,
> vm.dirty_background_bytes = 0;
> vm.dirty_background_ratio=10;
> then I execute the bellow command,
> sysctl -w vm.dirty_background_bytes=0
> at the end these two values will be,
> vm.dirty_background_bytes = 0;
> vm.dirty_background_ratio=0;
> I'm not sure if this is needed under some certain conditons, So I have
> to keep it as before.

OK, this is somewhat the problem of the switching logic between _bytes and
_ratio bytes and also the fact that '0' has a special meaning in these
files. I think the cleanest would be to just refuse writing of '0' into any
of these files which would deal with the problem as well.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-09-21 01:42:25

by Yafang Shao

[permalink] [raw]
Subject: Re: [PATCH v2] mm: introduce validity check on vm dirtiness settings

2017-09-20 23:33 GMT+08:00 Jan Kara <[email protected]>:
> On Tue 19-09-17 19:48:00, Yafang Shao wrote:
>> 2017-09-19 16:35 GMT+08:00 Jan Kara <[email protected]>:
>> > On Tue 19-09-17 06:53:00, Yafang Shao wrote:
>> >> + if (vm_dirty_bytes == 0 && vm_dirty_ratio == 0 &&
>> >> + (dirty_background_bytes != 0 || dirty_background_ratio != 0))
>> >> + ret = false;
>> >
>> > Hum, why not just:
>> > if ((vm_dirty_bytes == 0 && vm_dirty_ratio == 0) ||
>> > (dirty_background_bytes == 0 && dirty_background_ratio == 0))
>> > ret = false;
>> >
>> > IMHO setting either tunable to 0 is just wrong and actively dangerous...
>> >
>>
>> Because these four variables all could be set to 0 before, and I'm not
>> sure if this
>> is needed under some certain conditions, although I think this is
>> dangerous but I have
>> to keep it as before.
>>
>> If you think that is wrong, then I will modified it as you suggested.
>
> OK, I see but see below.
>
>> >> int dirty_background_ratio_handler(struct ctl_table *table, int write,
>> >> void __user *buffer, size_t *lenp,
>> >> loff_t *ppos)
>> >> {
>> >> int ret;
>> >> + int old_ratio = dirty_background_ratio;
>> >>
>> >> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> >> - if (ret == 0 && write)
>> >> - dirty_background_bytes = 0;
>> >> + if (ret == 0 && write) {
>> >> + if (dirty_background_ratio != old_ratio &&
>> >> + !vm_dirty_settings_valid()) {
>> >
>> > Why do you check whether new ratio is different here? If it is really
>> > needed, it would deserve a comment.
>> >
>>
>> There're two reseaons,
>> 1. if you set a value same with the old value, it's needn't to do this check.
>> 2. there's another behavior that I'm not sure whether it is reaonable. i.e.
>> if the old value is,
>> vm.dirty_background_bytes = 0;
>> vm.dirty_background_ratio=10;
>> then I execute the bellow command,
>> sysctl -w vm.dirty_background_bytes=0
>> at the end these two values will be,
>> vm.dirty_background_bytes = 0;
>> vm.dirty_background_ratio=0;
>> I'm not sure if this is needed under some certain conditons, So I have
>> to keep it as before.
>
> OK, this is somewhat the problem of the switching logic between _bytes and
> _ratio bytes and also the fact that '0' has a special meaning in these
> files. I think the cleanest would be to just refuse writing of '0' into any
> of these files which would deal with the problem as well.
Got it.
I will submit a new patch then.

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR