/proc/pid/oom_adj exists solely to avoid breaking existing userspace
binaries that write to the tunable.
Add a comment in the only possible location within the kernel tree to
describe the situation and motivation for keeping it around.
Signed-off-by: David Rientjes <[email protected]>
---
fs/proc/base.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/proc/base.c b/fs/proc/base.c
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1032,6 +1032,16 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
return simple_read_from_buffer(buf, count, ppos, buffer, len);
}
+/*
+ * /proc/pid/oom_adj exists solely for backwards compatibility with previous
+ * kernels. The effective policy is defined by oom_score_adj, which has a
+ * different scale: oom_adj grew exponentially and oom_score_adj grows linearly.
+ * Values written to oom_adj are simply mapped linearly to oom_score_adj.
+ * Processes that become oom disabled via oom_adj will still be oom disabled
+ * with this implementation.
+ *
+ * oom_adj cannot be removed since existing userspace binaries use it.
+ */
static ssize_t oom_adj_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
On Wed 04-11-15 12:32:14, David Rientjes wrote:
> /proc/pid/oom_adj exists solely to avoid breaking existing userspace
> binaries that write to the tunable.
>
> Add a comment in the only possible location within the kernel tree to
> describe the situation and motivation for keeping it around.
I am not sure this is really needed but it certainly is not harmful.
If this is a way to suppress any attempts for changes like
http://lkml.kernel.org/r/1f80189385e540c2a5b2747a7a265d8c%40SHMBX01.spreadtrum.com
then it does not explain why those are not desirable.
> Signed-off-by: David Rientjes <[email protected]>
> ---
> fs/proc/base.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1032,6 +1032,16 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
> return simple_read_from_buffer(buf, count, ppos, buffer, len);
> }
>
> +/*
> + * /proc/pid/oom_adj exists solely for backwards compatibility with previous
> + * kernels. The effective policy is defined by oom_score_adj, which has a
> + * different scale: oom_adj grew exponentially and oom_score_adj grows linearly.
> + * Values written to oom_adj are simply mapped linearly to oom_score_adj.
> + * Processes that become oom disabled via oom_adj will still be oom disabled
> + * with this implementation.
> + *
> + * oom_adj cannot be removed since existing userspace binaries use it.
This is a bit strong wording. I think the knob can be removed in the future.
* oom_adj is kept for compatibility reasons. There are still few
* projects which use oom_adj only. We have tried to convert all of them
* which could be found but it will take some time until all those changes
* bubble up to all users. We might try to remove the knob in few years
* if the situtation changes.
> + */
> static ssize_t oom_adj_write(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos)
> {
--
Michal Hocko
SUSE Labs
On Thu, 5 Nov 2015, Michal Hocko wrote:
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -1032,6 +1032,16 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
> > return simple_read_from_buffer(buf, count, ppos, buffer, len);
> > }
> >
> > +/*
> > + * /proc/pid/oom_adj exists solely for backwards compatibility with previous
> > + * kernels. The effective policy is defined by oom_score_adj, which has a
> > + * different scale: oom_adj grew exponentially and oom_score_adj grows linearly.
> > + * Values written to oom_adj are simply mapped linearly to oom_score_adj.
> > + * Processes that become oom disabled via oom_adj will still be oom disabled
> > + * with this implementation.
> > + *
> > + * oom_adj cannot be removed since existing userspace binaries use it.
>
> This is a bit strong wording. I think the knob can be removed in the future.
>
Perhaps you are my optimistic than I am, but I would think it would be
difficult to remove a tunable that requires binaries to be re-built to
avoid. That was Linus's primary objection, IIRC. If an application fails
to oom disable itself because it still writes to oom_adj, the results
could be a system wide failure. There are workarounds to that if you have
root, but I don't think we're in a position to remove it in the near
future. I think the comment is clear why it cannot be removed right now
and its current implementation.
Converting software that writes to oom_adj to use oom_score_adj instead is
still a worthwhile goal, though, since they'd be using the semantics of
the effective policy.
>
> /proc/pid/oom_adj exists solely to avoid breaking existing userspace
> binaries that write to the tunable.
>
> Add a comment in the only possible location within the kernel tree to
> describe the situation and motivation for keeping it around.
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
Acked-by: Hillf Danton <[email protected]>
> fs/proc/base.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1032,6 +1032,16 @@ static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
> return simple_read_from_buffer(buf, count, ppos, buffer, len);
> }
>
> +/*
> + * /proc/pid/oom_adj exists solely for backwards compatibility with previous
> + * kernels. The effective policy is defined by oom_score_adj, which has a
> + * different scale: oom_adj grew exponentially and oom_score_adj grows linearly.
> + * Values written to oom_adj are simply mapped linearly to oom_score_adj.
> + * Processes that become oom disabled via oom_adj will still be oom disabled
> + * with this implementation.
> + *
> + * oom_adj cannot be removed since existing userspace binaries use it.
> + */
> static ssize_t oom_adj_write(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos)
> {
> --