Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759046Ab0KPBVG (ORCPT ); Mon, 15 Nov 2010 20:21:06 -0500 Received: from smtp-out.google.com ([74.125.121.35]:17116 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754553Ab0KPBVE (ORCPT ); Mon, 15 Nov 2010 20:21:04 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:x-operating-system :user-agent; b=VqIHr4dBFy83G5ASvjgtTq4gPcCNkPEHzeWIe8Vb0xTdDkEXpbtyw6C7tClkZ7/j47 zqkWCtOvGBBXONyNeycQ== Date: Mon, 15 Nov 2010 16:03:59 -0800 From: Mandeep Singh Baines To: David Rientjes Cc: Mandeep Singh Baines , Andrew Morton , KAMEZAWA Hiroyuki , KOSAKI Motohiro , Rik van Riel , Ying Han , linux-kernel@vger.kernel.org, gspencer@chromium.org, piman@chromium.org, wad@chromium.org, olofj@chromium.org, Bodo Eggert <7eggert@web.de> Subject: [PATCH v3] oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down Message-ID: <20101116000359.GS7363@google.com> References: <20101111183050.GI7363@google.com> <20101111222509.GJ7363@google.com> <20101111235620.GK7363@google.com> <20101113004657.GN7363@google.com> <20101115220150.GR7363@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux/2.6.32-gg252-generic (x86_64) User-Agent: Mutt/1.5.20 (2009-06-14) X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3909 Lines: 100 We'd like to be able to oom_score_adj a process up/down as its enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each tab process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes . Signed-off-by: Mandeep Singh Baines Acked-by: David Rientjes --- Documentation/filesystems/proc.txt | 4 ++++ fs/proc/base.c | 4 +++- include/linux/sched.h | 2 ++ kernel/fork.c | 1 + 4 files changed, 10 insertions(+), 1 deletions(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index e73df27..7139c50 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1296,6 +1296,10 @@ scaled linearly with /proc//oom_score_adj. Writing to /proc//oom_score_adj or /proc//oom_adj will change the other with its scaled value. +The value of /proc//oom_score_adj may be reduced no lower than the last +value set by a CAP_SYS_RESOURCE process. To reduce the value any lower +requires CAP_SYS_RESOURCE. + NOTICE: /proc//oom_adj is deprecated and will be removed, please see Documentation/feature-removal-schedule.txt. diff --git a/fs/proc/base.c b/fs/proc/base.c index f3d02ca..7b1a9df 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1164,7 +1164,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, goto err_task_lock; } - if (oom_score_adj < task->signal->oom_score_adj && + if (oom_score_adj < task->signal->oom_score_adj_min && !capable(CAP_SYS_RESOURCE)) { err = -EACCES; goto err_sighand; @@ -1177,6 +1177,8 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf, atomic_dec(&task->mm->oom_disable_count); } task->signal->oom_score_adj = oom_score_adj; + if (has_capability_noaudit(current, CAP_SYS_RESOURCE)) + task->signal->oom_score_adj_min = oom_score_adj; /* * Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is * always attainable. diff --git a/include/linux/sched.h b/include/linux/sched.h index f53cdf2..2a71ee0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -626,6 +626,8 @@ struct signal_struct { int oom_adj; /* OOM kill score adjustment (bit shift) */ int oom_score_adj; /* OOM kill score adjustment */ + int oom_score_adj_min; /* OOM kill score adjustment minimum value. + * Only settable by CAP_SYS_RESOURCE. */ struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations diff --git a/kernel/fork.c b/kernel/fork.c index 3b159c5..0979527 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -907,6 +907,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) sig->oom_adj = current->signal->oom_adj; sig->oom_score_adj = current->signal->oom_score_adj; + sig->oom_score_adj_min = current->signal->oom_score_adj_min; mutex_init(&sig->cred_guard_mutex); -- 1.7.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/