Hello,
If I remember correctly kernel developers promised not to break user APIs in the Linux kernel
however in Linux 3.7 a /proc/<PID>/oom_adj file has been totally removed which makes KDE 3.5.x
(and I suppose TDE) unusable as they depend on this file and simply refuse to start in its absence.
Also, udev in CentOS/RHEL/Scientific and other old Linux'es depend on this file (which is not critical
but not pleasant anyway).
I'm not a hacker/programmer to edit and recompile udev/KDE/TDE sources so I'd be glad if
this file was restored.
If the interface behind this file was totally removed and there's no way you can bring it back,
can I politely ask to export a dummy oom_adj file which allows the same mode of access yet
the kernel does nothing in return.
--
Thank you,
Artem
On Mon, 12 Nov 2012, Artem S. Tashkinov wrote:
> Hello,
>
> If I remember correctly kernel developers promised not to break user APIs in the Linux kernel
> however in Linux 3.7 a /proc/<PID>/oom_adj file has been totally removed which makes KDE 3.5.x
> (and I suppose TDE) unusable as they depend on this file and simply refuse to start in its absence.
>
This is
http://mail.kde.org/pipermail/unassigned-bugs/2010-December/016850.html
from almost two years ago. The tunable had been on the feature removal
schedule for two years and would have emitted a warning if you had run on
any kernel during that time. KDE 4.6.1 and later from two years ago work
fine.
> Also, udev in CentOS/RHEL/Scientific and other old Linux'es depend on this file (which is not critical
> but not pleasant anyway).
>
Fixed in udev v162 at the same time kde was.
On Mon, Nov 12, 2012 at 1:41 PM, David Rientjes <[email protected]> wrote:
>
> This is
> http://mail.kde.org/pipermail/unassigned-bugs/2010-December/016850.html
> from almost two years ago. The tunable had been on the feature removal
> schedule for two years and would have emitted a warning if you had run on
> any kernel during that time. KDE 4.6.1 and later from two years ago work
> fine.
Doesn't matter. If something breaks real users, it needs to get
reverted (or worked around).
The feature-removal.txt file was a joke, and was used as an excuse for
exactly the above kind of bad behavior. It got removed for good
reasons.
The work-around might be as simple as creating an oom_adj file that
simply doesn't do anything.
Linus
On Mon, 12 Nov 2012, Linus Torvalds wrote:
> The work-around might be as simple as creating an oom_adj file that
> simply doesn't do anything.
>
You want to carry a dummy tunable forever? If not, then how long if two
years isn't sufficient?
On Mon, Nov 12, 2012 at 5:18 PM, David Rientjes <[email protected]> wrote:
>
> You want to carry a dummy tunable forever?
Feel free to make it non-dummy.
> If not, then how long if two years isn't sufficient?
What part of "We don't break user space" do you have trouble understanding?
There is no time limit. It's not about time. It's about not breaking user space.
End of discussion. I don't understand why people have such a hard time
understanding such a simple concept. If people reports it breaks
user-space, we fix it. And we keep fixing things until nobody uses the
old versions any more. Seriously, IT IS THAT SIMPLE.
Linus
This is mostly a revert of 01dc52ebdf47 ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.
It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels. It simply scales the value linearly when /proc/pid/oom_score_adj
is written.
The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt. We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.
Reported-by: Artem S. Tashkinov <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
Documentation/filesystems/proc.txt | 16 ++++--
fs/proc/base.c | 109 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/oom.h | 9 +++
3 files changed, 130 insertions(+), 4 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,7 +33,7 @@ Table of Contents
2 Modifying System Parameters
3 Per-Process Parameters
- 3.1 /proc/<pid>/oom_score_adj - Adjust the oom-killer
+ 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
score
3.2 /proc/<pid>/oom_score - Display current oom-killer score
3.3 /proc/<pid>/io - Display the IO accounting fields
@@ -1320,10 +1320,10 @@ of the kernel.
CHAPTER 3: PER-PROCESS PARAMETERS
------------------------------------------------------------------------------
-3.1 /proc/<pid>/oom_score_adj- Adjust the oom-killer score
+3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
--------------------------------------------------------------------------------
-This file can be used to adjust the badness heuristic used to select which
+These file can be used to adjust the badness heuristic used to select which
process gets killed in out of memory conditions.
The badness heuristic assigns a value to each candidate task ranging from 0
@@ -1361,6 +1361,12 @@ same system, cpuset, mempolicy, or memory controller resources to use at least
equivalent to discounting 50% of the task's allowed memory from being considered
as scoring against the task.
+For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
+be used to tune the badness score. Its acceptable values range from -16
+(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
+(OOM_DISABLE) to disable oom killing entirely for that task. Its value is
+scaled linearly with /proc/<pid>/oom_score_adj.
+
The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
requires CAP_SYS_RESOURCE.
@@ -1375,7 +1381,9 @@ minimal amount of work.
-------------------------------------------------------------
This file can be used to check the current score used by the oom-killer is for
-any given <pid>.
+any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
+process should be killed in an out-of-memory situation.
+
3.3 /proc/<pid>/io - Display the IO accounting fields
-------------------------------------------------------
diff --git a/fs/proc/base.c b/fs/proc/base.c
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -873,6 +873,113 @@ static const struct file_operations proc_environ_operations = {
.release = mem_release,
};
+static ssize_t oom_adj_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
+ char buffer[PROC_NUMBUF];
+ int oom_adj = OOM_ADJUST_MIN;
+ size_t len;
+ unsigned long flags;
+
+ if (!task)
+ return -ESRCH;
+ if (lock_task_sighand(task, &flags)) {
+ if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MAX)
+ oom_adj = OOM_ADJUST_MAX;
+ else
+ oom_adj = (task->signal->oom_score_adj * -OOM_DISABLE) /
+ OOM_SCORE_ADJ_MAX;
+ unlock_task_sighand(task, &flags);
+ }
+ put_task_struct(task);
+ len = snprintf(buffer, sizeof(buffer), "%d\n", oom_adj);
+ return simple_read_from_buffer(buf, count, ppos, buffer, len);
+}
+
+static ssize_t oom_adj_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task;
+ char buffer[PROC_NUMBUF];
+ int oom_adj;
+ unsigned long flags;
+ int err;
+
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ err = kstrtoint(strstrip(buffer), 0, &oom_adj);
+ if (err)
+ goto out;
+ if ((oom_adj < OOM_ADJUST_MIN || oom_adj > OOM_ADJUST_MAX) &&
+ oom_adj != OOM_DISABLE) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ task = get_proc_task(file->f_path.dentry->d_inode);
+ if (!task) {
+ err = -ESRCH;
+ goto out;
+ }
+
+ task_lock(task);
+ if (!task->mm) {
+ err = -EINVAL;
+ goto err_task_lock;
+ }
+
+ if (!lock_task_sighand(task, &flags)) {
+ err = -ESRCH;
+ goto err_task_lock;
+ }
+
+ /*
+ * Scale /proc/pid/oom_score_adj appropriately ensuring that a maximum
+ * value is always attainable.
+ */
+ if (oom_adj == OOM_ADJUST_MAX)
+ oom_adj = OOM_SCORE_ADJ_MAX;
+ else
+ oom_adj = (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE;
+
+ if (oom_adj < task->signal->oom_score_adj &&
+ !capable(CAP_SYS_RESOURCE)) {
+ err = -EACCES;
+ goto err_sighand;
+ }
+
+ /*
+ * /proc/pid/oom_adj is provided for legacy purposes, ask users to use
+ * /proc/pid/oom_score_adj instead.
+ */
+ printk_once(KERN_WARNING "%s (%d): /proc/%d/oom_adj is deprecated, please use /proc/%d/oom_score_adj instead.\n",
+ current->comm, task_pid_nr(current), task_pid_nr(task),
+ task_pid_nr(task));
+
+ task->signal->oom_score_adj = oom_adj;
+ trace_oom_score_adj_update(task);
+err_sighand:
+ unlock_task_sighand(task, &flags);
+err_task_lock:
+ task_unlock(task);
+ put_task_struct(task);
+out:
+ return err < 0 ? err : count;
+}
+
+static const struct file_operations proc_oom_adj_operations = {
+ .read = oom_adj_read,
+ .write = oom_adj_write,
+ .llseek = generic_file_llseek,
+};
+
static ssize_t oom_score_adj_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -2598,6 +2705,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("cgroup", S_IRUGO, proc_cgroup_operations),
#endif
INF("oom_score", S_IRUGO, proc_oom_score),
+ REG("oom_adj", S_IRUGO|S_IWUSR, proc_oom_adj_operations),
REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
#ifdef CONFIG_AUDITSYSCALL
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
@@ -2964,6 +3072,7 @@ static const struct pid_entry tid_base_stuff[] = {
REG("cgroup", S_IRUGO, proc_cgroup_operations),
#endif
INF("oom_score", S_IRUGO, proc_oom_score),
+ REG("oom_adj", S_IRUGO|S_IWUSR, proc_oom_adj_operations),
REG("oom_score_adj", S_IRUGO|S_IWUSR, proc_oom_score_adj_operations),
#ifdef CONFIG_AUDITSYSCALL
REG("loginuid", S_IWUSR|S_IRUGO, proc_loginuid_operations),
diff --git a/include/uapi/linux/oom.h b/include/uapi/linux/oom.h
--- a/include/uapi/linux/oom.h
+++ b/include/uapi/linux/oom.h
@@ -8,4 +8,13 @@
#define OOM_SCORE_ADJ_MIN (-1000)
#define OOM_SCORE_ADJ_MAX 1000
+/*
+ * /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy
+ * purposes.
+ */
+#define OOM_DISABLE (-17)
+/* inclusive */
+#define OOM_ADJUST_MIN (-16)
+#define OOM_ADJUST_MAX 15
+
#endif /* _UAPI__INCLUDE_LINUX_OOM_H */