2020-06-26 04:03:20

by Dave Hansen

[permalink] [raw]
Subject: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI


From: Dave Hansen <[email protected]>

I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.

The VM evidently stopped caring about RECLAIM_ZONE at some point (or
never cared) and the #define itself was later removed as a cleanup.
Those things by themselves are fine.

But, the _other_ bit locations also got changed. That's not OK because
the bit values are documented to mean one specific thing and users
surely rely on them meaning that one thing and not changing from
kernel to kernel. The end result is that if someone had a script
that did:

sysctl vm.zone_reclaim_mode=1

That script went from doing nothing to writing out pages during
node reclaim after the commit in question. That's not great.

Put the bits back the way they were and add a comment so something
like this is a bit harder to do again. Update the documentation to
make it clear that the first bit is ignored.

Signed-off-by: Dave Hansen <[email protected]>
Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Acked-by: Ben Widawsky <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Daniel Wagner <[email protected]>
Cc: "Tobin C. Harding" <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
---

b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++------
b/mm/vmscan.c | 9 +++++++--
2 files changed, 13 insertions(+), 8 deletions(-)

diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.559165912 -0700
+++ b/mm/vmscan.c 2020-06-25 17:32:11.572165912 -0700
@@ -4090,8 +4090,13 @@ module_init(kswapd_init)
*/
int node_reclaim_mode __read_mostly;

-#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI. New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_RSVD (1<<0) /* (currently ignored/unused) */
+#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */

/*
* Priority for NODE_RECLAIM. This determines the fraction of pages
diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.562165912 -0700
+++ b/Documentation/admin-guide/sysctl/vm.rst 2020-06-25 17:32:11.572165912 -0700
@@ -938,7 +938,7 @@ in the system.
This is value OR'ed together of

= ===================================
-1 Zone reclaim on
+1 (bit currently ignored)
2 Zone reclaim writes dirty pages out
4 Zone reclaim swaps pages
= ===================================
@@ -948,11 +948,11 @@ that benefit from having their data cach
left disabled as the caching effect is likely to be more important than
data locality.

-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction. The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction. The page allocator will take additional actions before
+allocating off node pages.

Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
_


2020-06-26 08:00:16

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

Hi Dave

On Thu, Jun 25, 2020 at 05:34:59PM -0700, Dave Hansen wrote:
>
> From: Dave Hansen <[email protected]>
>
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl. Like a good kernel developer, I also went to go update the
> documentation. I noticed that the bits in the documentation didn't
> match the bits in the #defines.

Drop the this paragraph from the commit message. It doesn't add
any necessart information.

Please have a look at

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
> never cared) and the #define itself was later removed as a cleanup.
> Those things by themselves are fine.
>
> But, the _other_ bit locations also got changed. That's not OK because
> the bit values are documented to mean one specific thing and users
> surely rely on them meaning that one thing and not changing from
> kernel to kernel. The end result is that if someone had a script
> that did:
>
> sysctl vm.zone_reclaim_mode=1
>
> That script went from doing nothing to writing out pages during
> node reclaim after the commit in question. That's not great.
>
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again. Update the documentation to
> make it clear that the first bit is ignored.
>
> Signed-off-by: Dave Hansen <[email protected]>
> Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Acked-by: Ben Widawsky <[email protected]>
> Cc: Alex Shi <[email protected]>
> Cc: Daniel Wagner <[email protected]>
> Cc: "Tobin C. Harding" <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
>
> b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++------
> b/mm/vmscan.c | 9 +++++++--
> 2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.559165912 -0700
> +++ b/mm/vmscan.c 2020-06-25 17:32:11.572165912 -0700
> @@ -4090,8 +4090,13 @@ module_init(kswapd_init)
> */
> int node_reclaim_mode __read_mostly;
>
> -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
> -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
> +/*
> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> + * ABI. New bits are OK, but existing bits can never change.
> + */
> +#define RECLAIM_RSVD (1<<0) /* (currently ignored/unused) */
> +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
> +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
>
> /*
> * Priority for NODE_RECLAIM. This determines the fraction of pages
> diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.562165912 -0700
> +++ b/Documentation/admin-guide/sysctl/vm.rst 2020-06-25 17:32:11.572165912 -0700
> @@ -938,7 +938,7 @@ in the system.
> This is value OR'ed together of
>
> = ===================================
> -1 Zone reclaim on
> +1 (bit currently ignored)
> 2 Zone reclaim writes dirty pages out
> 4 Zone reclaim swaps pages
> = ===================================
> @@ -948,11 +948,11 @@ that benefit from having their data cach
> left disabled as the caching effect is likely to be more important than
> data locality.
>
> -zone_reclaim may be enabled if it's known that the workload is partitioned
> -such that each partition fits within a NUMA node and that accessing remote
> -memory would cause a measurable performance reduction. The page allocator
> -will then reclaim easily reusable pages (those page cache pages that are
> -currently not used) before allocating off node pages.
> +Consider enabling one or more zone_reclaim mode bits if it's known that the
> +workload is partitioned such that each partition fits within a NUMA node
> +and that accessing remote memory would cause a measurable performance
> +reduction. The page allocator will take additional actions before
> +allocating off node pages.

I think the documentation update should not be part of this patch.
This makes the back porting to stable more difficult.

Thanks,
Daniel

2020-06-26 13:55:10

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/26/20 12:59 AM, Daniel Wagner wrote:
>> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
>> sysctl. Like a good kernel developer, I also went to go update the
>> documentation. I noticed that the bits in the documentation didn't
>> match the bits in the #defines.
> Drop the this paragraph from the commit message. It doesn't add
> any necessart information.
>
> Please have a look at
>
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Sure! The first paragraph says:

Describe your problem. ... there must be an underlying
problem that motivated you to do this work.

This describes what motivated my work and whether it caused an actual
versus theoretical underlying problem.

Reviewers and maintainers often want to know the impact of a bug fix.
It's important to convey whether this was found because it caused
datacenters in 14 states to catch fire or whether it was found by
inspection.

This provides the information that it was found by inspection.

Was there something else specifically in the documentation which you
think I've neglected?

>> -zone_reclaim may be enabled if it's known that the workload is partitioned
>> -such that each partition fits within a NUMA node and that accessing remote
>> -memory would cause a measurable performance reduction. The page allocator
>> -will then reclaim easily reusable pages (those page cache pages that are
>> -currently not used) before allocating off node pages.
>> +Consider enabling one or more zone_reclaim mode bits if it's known that the
>> +workload is partitioned such that each partition fits within a NUMA node
>> +and that accessing remote memory would cause a measurable performance
>> +reduction. The page allocator will take additional actions before
>> +allocating off node pages.
>
> I think the documentation update should not be part of this patch.
> This makes the back porting to stable more difficult.

Really? If a backporter doesn't care about documentation, I'd just
expect them to see the reject, ignore it, and move on with their life.
If they do, they'd want the code fix and the Documentation/ update in
the same patch so that they don't get disconnected.

2020-06-26 19:25:08

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On Thu, Jun 25, 2020 at 05:34:59PM -0700, Dave Hansen wrote:
>
> From: Dave Hansen <[email protected]>
>
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl. Like a good kernel developer, I also went to go update the
> documentation. I noticed that the bits in the documentation didn't
> match the bits in the #defines.
>
> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
> never cared) and the #define itself was later removed as a cleanup.
> Those things by themselves are fine.
>
> But, the _other_ bit locations also got changed. That's not OK because
> the bit values are documented to mean one specific thing and users
> surely rely on them meaning that one thing and not changing from
> kernel to kernel. The end result is that if someone had a script
> that did:
>
> sysctl vm.zone_reclaim_mode=1
>
> That script went from doing nothing to writing out pages during
> node reclaim after the commit in question. That's not great.
>
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again. Update the documentation to
> make it clear that the first bit is ignored.
>
> Signed-off-by: Dave Hansen <[email protected]>
> Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")

This is a wrong format.

2020-06-26 21:25:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/26/20 12:24 PM, Qian Cai wrote:
>> Signed-off-by: Dave Hansen <[email protected]>
>> Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> This is a wrong format.

The Fixes: line? Yeah, I think checkpatch was complaing about a
reference to a commit in the main changelog, and it suggested prefixing
the sha1 with "commit". I misinterpreted that warning to mean the
"Fixes:" line instead of the changelog.

I'll fix it up in v2.

2020-06-29 18:56:17

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 06/25/20 at 05:34pm, Dave Hansen wrote:
>
> From: Dave Hansen <[email protected]>
>
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl. Like a good kernel developer, I also went to go update the
> documentation. I noticed that the bits in the documentation didn't
> match the bits in the #defines.
>
> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
> never cared) and the #define itself was later removed as a cleanup.

From git history, it seems to never care about the RECLAIM_ZONE bit.

I think this patch is justified. I have one question about adding back
the RECLAIM_ZONE bit. Since we introduced RECLAIM_ZONE in the first
place, but never use it, removing it truly may fail some existing
script, does it mean we will never have chance to fix/clean up this kind
of mess?

Do we have possibility to remove it in mainline tree, let distos or
stable kernel maintainer take care of the back porting? Like this, any
stable kernel after 5.8, or any distrols which chooses post v5.8 kenrel
as base won't have this confusion. I am not objecting this patch, just
be curious if we have a way to fix/clean up for this type of issue.

Thanks
Baoquan

> Those things by themselves are fine.
>
> But, the _other_ bit locations also got changed. That's not OK because
> the bit values are documented to mean one specific thing and users
> surely rely on them meaning that one thing and not changing from
> kernel to kernel. The end result is that if someone had a script
> that did:
>
> sysctl vm.zone_reclaim_mode=1
>
> That script went from doing nothing to writing out pages during
> node reclaim after the commit in question. That's not great.
>
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again. Update the documentation to
> make it clear that the first bit is ignored.
>
> Signed-off-by: Dave Hansen <[email protected]>
> Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Acked-by: Ben Widawsky <[email protected]>
> Cc: Alex Shi <[email protected]>
> Cc: Daniel Wagner <[email protected]>
> Cc: "Tobin C. Harding" <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
>
> b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++------
> b/mm/vmscan.c | 9 +++++++--
> 2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.559165912 -0700
> +++ b/mm/vmscan.c 2020-06-25 17:32:11.572165912 -0700
> @@ -4090,8 +4090,13 @@ module_init(kswapd_init)
> */
> int node_reclaim_mode __read_mostly;
>
> -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
> -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
> +/*
> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> + * ABI. New bits are OK, but existing bits can never change.
> + */
> +#define RECLAIM_RSVD (1<<0) /* (currently ignored/unused) */
> +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
> +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
>
> /*
> * Priority for NODE_RECLAIM. This determines the fraction of pages
> diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.562165912 -0700
> +++ b/Documentation/admin-guide/sysctl/vm.rst 2020-06-25 17:32:11.572165912 -0700
> @@ -938,7 +938,7 @@ in the system.
> This is value OR'ed together of
>
> = ===================================
> -1 Zone reclaim on
> +1 (bit currently ignored)
> 2 Zone reclaim writes dirty pages out
> 4 Zone reclaim swaps pages
> = ===================================
> @@ -948,11 +948,11 @@ that benefit from having their data cach
> left disabled as the caching effect is likely to be more important than
> data locality.
>
> -zone_reclaim may be enabled if it's known that the workload is partitioned
> -such that each partition fits within a NUMA node and that accessing remote
> -memory would cause a measurable performance reduction. The page allocator
> -will then reclaim easily reusable pages (those page cache pages that are
> -currently not used) before allocating off node pages.
> +Consider enabling one or more zone_reclaim mode bits if it's known that the
> +workload is partitioned such that each partition fits within a NUMA node
> +and that accessing remote memory would cause a measurable performance
> +reduction. The page allocator will take additional actions before
> +allocating off node pages.
>
> Allowing zone reclaim to write out pages stops processes that are
> writing large amounts of data from dirtying pages on other nodes. Zone
> _
>

2020-06-29 18:59:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/28/20 11:52 PM, Baoquan He wrote:
> On 06/25/20 at 05:34pm, Dave Hansen wrote:
>>
>> From: Dave Hansen <[email protected]>
>>
>> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
>> sysctl. Like a good kernel developer, I also went to go update the
>> documentation. I noticed that the bits in the documentation didn't
>> match the bits in the #defines.
>>
>> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
>> never cared) and the #define itself was later removed as a cleanup.
>
>>From git history, it seems to never care about the RECLAIM_ZONE bit.
>
> I think this patch is justified. I have one question about adding back
> the RECLAIM_ZONE bit. Since we introduced RECLAIM_ZONE in the first
> place, but never use it, removing it truly may fail some existing
> script, does it mean we will never have chance to fix/clean up this kind
> of mess?

Our #1 rule is "don't break userspace". We only break userspace when we
have no other choice.

This case a bit fuzzier because we don't know if anyone is depending on
the ignored bit to do anything. But, it *was* documented. To me, that
means it might have been used, even though it would have been a
"placebo" bit.

> Do we have possibility to remove it in mainline tree, let distos or
> stable kernel maintainer take care of the back porting? Like this, any
> stable kernel after 5.8, or any distrols which chooses post v5.8 kenrel
> as base won't have this confusion. I am not objecting this patch, just
> be curious if we have a way to fix/clean up for this type of issue.

The only way I can plausibly think of "cleaning up" the RECLAIM_ZONE bit
would be to raise our confidence that it is truly unused. That takes
time, and probably a warning if we see it being set. If we don't run
into anybody setting it or depending on it being set in a few years, we
can remove it.

Backporting a _warning_ into the -stable trees might be an interesting
way to find users of older kernels mucking with it.

2020-06-29 19:07:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/29/20 8:53 AM, Daniel Wagner wrote:
> On Mon, Jun 29, 2020 at 07:36:15AM -0700, Dave Hansen wrote:
>> On 6/29/20 12:13 AM, Daniel Wagner wrote:
>>> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
>>> sysctl.
>>
>> This conveys my motivation.
>
> You don't need to tell a story.

Interesting! I literally tell folks all the time that their changelogs
*should* tell a story. :)

I think this is clearly a case where you and I have differing personal
opinions on style. I understand what you are asking for, and this is
usually a case where our friendly maintainers can help out of they have
a strong preference either way.

> As explained in the documetation 'Describe your changes', describe
> your change in imperative mood, e.g. something like
>
> "Add RECLAIM_* mode for the the zone_reclaim_mode sysctl."

This is my personal opinion, but I think this interpretation takes the
documentation a bit too literally. The documentation speaking about
"imperative mood" applies specifically to the language about your fix,
it does not apply to the background.

It is there to encourage contributors to avoid some of the more passive
language like:

This patch does blah.... This patch fixes blah...

2020-06-29 19:35:05

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

Hi Dave,

On Fri, Jun 26, 2020 at 06:53:33AM -0700, Dave Hansen wrote:
> Was there something else specifically in the documentation which you
> think I've neglected?

The first paragraph explains how you ended up modifying the code. While
I understand that you want to document the process, it wont help
a reader in future. It doesn't add any intersting information at all.
Just state what you're doing as first thing and explain why you are
doing it after it.

> > I think the documentation update should not be part of this patch.
> > This makes the back porting to stable more difficult.
>
> Really? If a backporter doesn't care about documentation, I'd just
> expect them to see the reject, ignore it, and move on with their life.
> If they do, they'd want the code fix and the Documentation/ update in
> the same patch so that they don't get disconnected.

I understood you are fixing a regression ingroduced by a previous change. In
this case I would only fix the regression. Updating/improving the
documentation is good, I just don't think it's necessary to back port it to
stables trees along side the bug fix.

Thanks,
Daniel

2020-06-29 19:43:45

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On Mon, Jun 29, 2020 at 07:36:15AM -0700, Dave Hansen wrote:
> On 6/29/20 12:13 AM, Daniel Wagner wrote:
> > I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> > sysctl.
>
> This conveys my motivation.

You don't need to tell a story. As explained in the documetation
'Describe your changes', describe your change in imperative mood,
e.g. something like

"Add RECLAIM_* mode for the the zone_reclaim_mode sysctl."

> > Like a good kernel developer, I also went to go update the
> > documentation.
>
> This takes the opportunity to throw a tiny bit of shade in the direction
> of the other folks who modified the #define without updating the
> documentation.

What about something simple as, "Update the documetation to
match the implementation" or whatever makes it clear why
you are touching the documetation bits.

Anyway, I just wanted to help you to write better commit
messages. I understand, my help is not welcome.

2020-06-29 21:25:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/29/20 12:13 AM, Daniel Wagner wrote:
> On Fri, Jun 26, 2020 at 06:53:33AM -0700, Dave Hansen wrote:
>> Was there something else specifically in the documentation which you
>> think I've neglected?
>
> The first paragraph explains how you ended up modifying the code. While
> I understand that you want to document the process, it wont help
> a reader in future. It doesn't add any intersting information at all.
> Just state what you're doing as first thing and explain why you are
> doing it after it.

I went back and looked at it to see what interesting information I think
the first paragraph conveys:

> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl.

This conveys my motivation. It relays that my discovery of the issue
was in the process of modifying the code, in contrast to it having being
found through observed pathological kernel behavior.

> Like a good kernel developer, I also went to go update the
> documentation.

This takes the opportunity to throw a tiny bit of shade in the direction
of the other folks who modified the #define without updating the
documentation. It also helps build the justification for the new
comment on top of the #defines. I guess you can argue that this should
be struck from the changelog. But, heck, it gave me a little chuckle
when read it just now.

> I noticed that the bits in the documentation didn't
> match the bits in the #defines.

This is the crux of the problem statement. Can't get rid of this.

So, I guess I could pare the above down to simply:

When modifying the zone_reclaim_mode sysctl documentation, I
found through inspection that the bits in the documentation
did not match the bits in the #defines.

That's certainly chuckle-free, and it would make my editor happy because
it's shorter and now he can sell another ad on the page. Would you
prefer that form?

>>> I think the documentation update should not be part of this patch.
>>> This makes the back porting to stable more difficult.
>>
>> Really? If a backporter doesn't care about documentation, I'd just
>> expect them to see the reject, ignore it, and move on with their life.
>> If they do, they'd want the code fix and the Documentation/ update in
>> the same patch so that they don't get disconnected.
>
> I understood you are fixing a regression ingroduced by a previous change. In
> this case I would only fix the regression. Updating/improving the
> documentation is good, I just don't think it's necessary to back port it to
> stables trees along side the bug fix.

That's a question for the person doing the backport. Attaching the two
things makes it the most likely that they will be given the best, most
complete information.

I've done my share of backports and I if it were me, I think I'd rather
have it this way.

2020-06-29 23:52:49

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/29/20 4:30 PM, Baoquan He wrote:
>> The only way I can plausibly think of "cleaning up" the RECLAIM_ZONE bit
>> would be to raise our confidence that it is truly unused. That takes
>> time, and probably a warning if we see it being set. If we don't run
>> into anybody setting it or depending on it being set in a few years, we
>> can remove it.
> So adding the old bit back for compatibility looks good, thanks.
>
> Then we have to be very careful when adding and reviewing new
> interface introducing, should not leave one which might be used
> in the future.
>
> In fact, RECLAIM_ZONE is not completely useless. At least, when the old
> bit 0 is set, it may enter into node_reclaim() in get_page_from_freelist(),
> that makes it like a switch.
>
> get_page_from_freelist {
>
> ...
> if (node_reclaim_mode == 0 ||
> !zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
> continue;
> ...
> }

Oh, that's a very good point. There are a couple of those around. Let
me circle back and update the documentation and the variable name. I'll
send out another version.

2020-06-29 23:52:49

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 06/29/20 at 07:27am, Dave Hansen wrote:
> On 6/28/20 11:52 PM, Baoquan He wrote:
> > On 06/25/20 at 05:34pm, Dave Hansen wrote:
> >>
> >> From: Dave Hansen <[email protected]>
> >>
> >> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> >> sysctl. Like a good kernel developer, I also went to go update the
> >> documentation. I noticed that the bits in the documentation didn't
> >> match the bits in the #defines.
> >>
> >> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
> >> never cared) and the #define itself was later removed as a cleanup.
> >
> >>From git history, it seems to never care about the RECLAIM_ZONE bit.
> >
> > I think this patch is justified. I have one question about adding back
> > the RECLAIM_ZONE bit. Since we introduced RECLAIM_ZONE in the first
> > place, but never use it, removing it truly may fail some existing
> > script, does it mean we will never have chance to fix/clean up this kind
> > of mess?
>
> Our #1 rule is "don't break userspace". We only break userspace when we
> have no other choice.
>
> This case a bit fuzzier because we don't know if anyone is depending on
> the ignored bit to do anything. But, it *was* documented. To me, that
> means it might have been used, even though it would have been a
> "placebo" bit.
>
> > Do we have possibility to remove it in mainline tree, let distos or
> > stable kernel maintainer take care of the back porting? Like this, any
> > stable kernel after 5.8, or any distrols which chooses post v5.8 kenrel
> > as base won't have this confusion. I am not objecting this patch, just
> > be curious if we have a way to fix/clean up for this type of issue.
>
> The only way I can plausibly think of "cleaning up" the RECLAIM_ZONE bit
> would be to raise our confidence that it is truly unused. That takes
> time, and probably a warning if we see it being set. If we don't run
> into anybody setting it or depending on it being set in a few years, we
> can remove it.

So adding the old bit back for compatibility looks good, thanks.

Then we have to be very careful when adding and reviewing new
interface introducing, should not leave one which might be used
in the future.

In fact, RECLAIM_ZONE is not completely useless. At least, when the old
bit 0 is set, it may enter into node_reclaim() in get_page_from_freelist(),
that makes it like a switch.

get_page_from_freelist {

...
if (node_reclaim_mode == 0 ||
!zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
continue;
...
}


>
> Backporting a _warning_ into the -stable trees might be an interesting
> way to find users of older kernels mucking with it.
>

2020-07-01 02:51:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On Mon, 29 Jun 2020 16:37:37 -0700 Dave Hansen <[email protected]> wrote:

> On 6/29/20 4:30 PM, Baoquan He wrote:
> >> The only way I can plausibly think of "cleaning up" the RECLAIM_ZONE bit
> >> would be to raise our confidence that it is truly unused. That takes
> >> time, and probably a warning if we see it being set. If we don't run
> >> into anybody setting it or depending on it being set in a few years, we
> >> can remove it.
> > So adding the old bit back for compatibility looks good, thanks.
> >
> > Then we have to be very careful when adding and reviewing new
> > interface introducing, should not leave one which might be used
> > in the future.
> >
> > In fact, RECLAIM_ZONE is not completely useless. At least, when the old
> > bit 0 is set, it may enter into node_reclaim() in get_page_from_freelist(),
> > that makes it like a switch.
> >
> > get_page_from_freelist {
> >
> > ...
> > if (node_reclaim_mode == 0 ||
> > !zone_allows_reclaim(ac->preferred_zoneref->zone, zone))
> > continue;
> > ...
> > }
>
> Oh, that's a very good point. There are a couple of those around. Let
> me circle back and update the documentation and the variable name. I'll
> send out another version.

Was the omission of cc:stable deliberate?

2020-07-01 15:31:36

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI

On 6/30/20 7:47 PM, Andrew Morton wrote:
>> Oh, that's a very good point. There are a couple of those around. Let
>> me circle back and update the documentation and the variable name. I'll
>> send out another version.
> Was the omission of cc:stable deliberate?

Nope, it was an accidental [email protected] instead of stable@vger.
<Sigh> Not the first time I've done that... I'll fix it up in the resend.