2006-05-30 22:35:59

by Dave Peterson

[permalink] [raw]
Subject: [PATCH (try #4)] mm: avoid unnecessary OOM kills

Below is a 2.6.17-rc4-mm3 patch that fixes a problem where the OOM killer was
unnecessarily killing system daemons in addition to memory-hogging user
processes. The patch fixes things so that the following assertion is
satisfied:

If a failed attempt to allocate memory triggers the OOM killer, then the
failed attempt must have occurred _after_ any process previously shot by
the OOM killer has cleaned out its mm_struct.

Thus we avoid situations where concurrent invocations of the OOM killer cause
more processes to be shot than necessary to resolve the OOM condition.

Signed-Off-By: David S. Peterson <[email protected]>
---
Changes in this version of patch:

- Loop over all tasks to determine whether OOM kill is in progress.
Eliminate OOM notify flag in mm_struct, etc.


diff -urNp -X dontdiff linux-2.6.17-rc5-mm1/mm/oom_kill.c linux-2.6.17-rc5-mm1-oom/mm/oom_kill.c
--- linux-2.6.17-rc5-mm1/mm/oom_kill.c 2006-05-30 14:05:53.000000000 -0700
+++ linux-2.6.17-rc5-mm1-oom/mm/oom_kill.c 2006-05-30 14:07:00.000000000 -0700
@@ -25,6 +25,23 @@
int sysctl_panic_on_oom;
/* #define DEBUG */

+/* Return 1 if OOM kill in progress. Else return 0. */
+int oom_kill_active(void)
+{
+ task_t *p, *q;
+
+ read_lock(&tasklist_lock);
+ do_each_thread(p, q) {
+ if (test_tsk_thread_flag(q, TIF_MEMDIE)) {
+ read_unlock(&tasklist_lock);
+ return 1;
+ }
+ } while_each_thread(p, q);
+ read_unlock(&tasklist_lock);
+
+ return 0;
+}
+
/**
* badness - calculate a numeric value for how bad this task has been
* @p: task struct of which task we should calculate
@@ -318,6 +335,7 @@ void out_of_memory(struct zonelist *zone
{
task_t *p;
unsigned long points = 0;
+ const char *msg = NULL;

if (printk_ratelimit()) {
printk("oom-killer: gfp_mask=0x%x, order=%d\n",
@@ -335,19 +353,19 @@ void out_of_memory(struct zonelist *zone
*/
switch (constrained_alloc(zonelist, gfp_mask)) {
case CONSTRAINT_MEMORY_POLICY:
- oom_kill_process(current, points,
- "No available memory (MPOL_BIND)");
+ p = current;
+ msg = "No available memory (MPOL_BIND)";
break;

case CONSTRAINT_CPUSET:
- oom_kill_process(current, points,
- "No available memory in cpuset");
+ p = current;
+ msg = "No available memory in cpuset";
break;

case CONSTRAINT_NONE:
if (sysctl_panic_on_oom)
panic("out of memory. panic_on_oom is selected\n");
-retry:
+
/*
* Rambo mode: Shoot down a process and hope it solves whatever
* issues we may have.
@@ -364,20 +382,16 @@ retry:
panic("Out of memory and no killable processes...\n");
}

- if (oom_kill_process(p, points, "Out of memory"))
- goto retry;
-
+ msg = "Out of memory";
break;
+
+ default:
+ BUG();
}

+ oom_kill_process(p, points, msg);
+
out:
read_unlock(&tasklist_lock);
cpuset_unlock();
-
- /*
- * Give "p" a good chance of killing itself before we
- * retry to allocate memory unless "p" is current
- */
- if (!test_thread_flag(TIF_MEMDIE))
- schedule_timeout_uninterruptible(1);
}
diff -urNp -X dontdiff linux-2.6.17-rc5-mm1/mm/page_alloc.c linux-2.6.17-rc5-mm1-oom/mm/page_alloc.c
--- linux-2.6.17-rc5-mm1/mm/page_alloc.c 2006-05-30 14:05:53.000000000 -0700
+++ linux-2.6.17-rc5-mm1-oom/mm/page_alloc.c 2006-05-30 14:07:00.000000000 -0700
@@ -992,6 +992,56 @@ static inline void set_page_owner(struct
}
#endif /* CONFIG_PAGE_OWNER */

+int oom_kill_active(void);
+
+/* If an OOM kill is not already in progress, try once more to allocate
+ * memory. If allocation fails this time, invoke the OOM killer.
+ */
+static struct page * oom_alloc(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist)
+{
+ static DECLARE_MUTEX(sem);
+ struct page *page;
+
+ down(&sem);
+
+ /* Prevent parallel OOM kill operations. This fixes a problem where
+ * the OOM killer was observed shooting system daemons in addition to
+ * memory-hogging user processes.
+ */
+ if (oom_kill_active()) {
+ up(&sem);
+ goto out_sleep;
+ }
+
+ /* If we get here, we _know_ that any previous OOM killer victim has
+ * cleaned out its mm_struct. Therefore we should pick a victim to
+ * shoot if this allocation fails.
+ */
+ page = get_page_from_freelist(gfp_mask | __GFP_HARDWALL, order,
+ zonelist, ALLOC_WMARK_HIGH | ALLOC_CPUSET);
+
+ if (page) {
+ up(&sem);
+ return page;
+ }
+
+ /* Try to shoot a process. */
+ out_of_memory(zonelist, gfp_mask, order);
+ up(&sem);
+
+out_sleep:
+ /* Did we get shot by the OOM killer? If not, sleep for a while to
+ * avoid burning lots of CPU cycles looping in the memory allocator.
+ * If the OOM killer shot a process, this gives the victim a good
+ * chance to die before we retry allocation.
+ */
+ if (!test_thread_flag(TIF_MEMDIE))
+ schedule_timeout_uninterruptible(1);
+
+ return NULL;
+}
+
/*
* This is the 'heart' of the zoned buddy allocator.
*/
@@ -1102,18 +1152,9 @@ rebalance:
if (page)
goto got_pg;
} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
- /*
- * Go through the zonelist yet one more time, keep
- * very high watermark here, this is only to catch
- * a parallel oom killing, we must fail if we're still
- * under heavy pressure.
- */
- page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
- zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
+ page = oom_alloc(gfp_mask, order, zonelist);
if (page)
goto got_pg;
-
- out_of_memory(zonelist, gfp_mask, order);
goto restart;
}


2006-05-30 23:56:25

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH (try #4)] mm: avoid unnecessary OOM kills

Dave Peterson wrote:
> Below is a 2.6.17-rc4-mm3 patch that fixes a problem where the OOM killer was
> unnecessarily killing system daemons in addition to memory-hogging user
> processes. The patch fixes things so that the following assertion is
> satisfied:
>
> If a failed attempt to allocate memory triggers the OOM killer, then the
> failed attempt must have occurred _after_ any process previously shot by
> the OOM killer has cleaned out its mm_struct.
>
> Thus we avoid situations where concurrent invocations of the OOM killer cause
> more processes to be shot than necessary to resolve the OOM condition.
>
> Signed-Off-By: David S. Peterson <[email protected]>

OK this is looking nice. And I was probably premature in thinking a
single simple call out to the oom code could replace your oom_alloc...
however it _still_ does a little bit too much OOM stuff for my liking.

Can you instead use two calls? oom_kill_prepare() and oom_kill_finish(),
between which you could try the final alloc? (also, declare functions in
.h files rather than .c files).

Lastly: currently, the final alloc only tries to allocate from the high
watermark because it is all racy anyway. If you've eliminated the races,
you might want to start using the low watermark for this.

Thanks,
Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com