Message-ID: <451D7C02.2090907@redhat.com>
Date: Fri, 29 Sep 2006 16:03:14 -0400
From: Larry Woodman <lwoodman@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2) Gecko/20040301
MIME-Version: 1.0
To: linux-kernel@vger.kernel.org
Subject: Re: oom kill oddness.
Content-Type: multipart/mixed;
 boundary="------------010606060904020809010901"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4058
Lines: 160

This is a multi-part message in MIME format.
--------------010606060904020809010901
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

>
>
>So I have two boxes that are very similar.
>Both have 2GB of RAM & 1GB of swap space.
>One has a 2.8GHz CPU, the other a 2.93GHz CPU, both dualcore.
>
>The slower box survives a 'make -j bzImage' of a 2.6.18 kernel tree
>without incident. (Although it takes ~4 minutes longer than a -j2)
>
>The faster box goes absolutely nuts, oomkilling everything in sight,
>until eventually after about 10 minutes, the box locks up dead,
>and won't even respond to pings.
>
>Oh, the only other difference - the slower box has 1 disk, whereas the
>faster box has two in RAID0.   I'm not surprised that stuff is getting
>oom-killed given the pathological scenario, but the fact that the
>box never recovered at all is a little odd.  Does md lack some means
>of dealing with low memory scenarios ?
>
>	Dave
>
Dave, this has been a problem since the out_of_memory() function was 
changed
between 2.6.10 and 2.6.11.  Before this change out_of_memory() required 
multiple
calls within 5 seconds before actually OOM killed a process.  After the 
change(in 2.6.11)
a single call to out_of_memory() results in OOM killing a process.  The 
following patch
allows the 2.6.18 system to run under much more memory pressure before 
it OOM kills.


--------------010606060904020809010901
Content-Type: text/plain;
 name="oomkill.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="oomkill.patch"

--- linux-2.6.18.noarch/mm/oom_kill.c.orig
+++ linux-2.6.18.noarch/mm/oom_kill.c
@@ -306,6 +306,69 @@ static int oom_kill_process(struct task_
 	return oom_kill_task(p, message);
 }
 
+int should_oom_kill(void)
+{
+	static spinlock_t oom_lock = SPIN_LOCK_UNLOCKED;
+	static unsigned long first, last, count, lastkill;
+	unsigned long now, since;
+	int ret = 0;
+
+	spin_lock(&oom_lock);
+	now = jiffies;
+	since = now - last;
+	last = now;
+
+	/*
+	 * If it's been a long time since last failure,
+	 * we're not oom.
+	 */
+	if (since > 5*HZ)
+		goto reset;
+
+	/*
+	 * If we haven't tried for at least one second,
+	 * we're not really oom.
+	 */
+	since = now - first;
+	if (since < HZ)
+		goto out_unlock;
+
+	/*
+	 * If we have gotten only a few failures,
+	 * we're not really oom.
+	 */
+	if (++count < 10)
+		goto out_unlock;
+
+	/*
+	 * If we just killed a process, wait a while
+	 * to give that task a chance to exit. This
+	 * avoids killing multiple processes needlessly.
+	 */
+	since = now - lastkill;
+	if (since < HZ*5)
+		goto out_unlock;
+
+	/*
+	 * Ok, really out of memory. Kill something.
+	 */
+	lastkill = now;
+	ret = 1;
+
+reset:
+/*
+ * We dropped the lock above, so check to be sure the variable
+ * first only ever increases to prevent false OOM's.
+ */
+	if (time_after(now, first))
+		first = now;
+	count = 0;
+
+out_unlock:
+	spin_unlock(&oom_lock);
+	return ret;
+}
+
 /**
  * out_of_memory - kill the "best" process when we run out of memory
  *
@@ -326,6 +389,9 @@ void out_of_memory(struct zonelist *zone
 		show_mem();
 	}
 
+	if (!should_oom_kill())
+		return;
+
 	cpuset_lock();
 	read_lock(&tasklist_lock);
 
--- linux-2.6.18.noarch/mm/vmscan.c.orig
+++ linux-2.6.18.noarch/mm/vmscan.c
@@ -999,10 +999,8 @@ unsigned long try_to_free_pages(struct z
 			reclaim_state->reclaimed_slab = 0;
 		}
 		total_scanned += sc.nr_scanned;
-		if (nr_reclaimed >= sc.swap_cluster_max) {
-			ret = 1;
+		if (nr_reclaimed >= sc.swap_cluster_max)
 			goto out;
-		}
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -1030,6 +1028,8 @@ out:
 
 		zone->prev_priority = zone->temp_priority;
 	}
+	if (nr_reclaimed)
+		ret = 1;
 	return ret;
 }
 

--------------010606060904020809010901--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/