Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933379Ab0BPVlp (ORCPT ); Tue, 16 Feb 2010 16:41:45 -0500 Received: from smtp-out.google.com ([216.239.44.51]:12181 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933316Ab0BPVlo (ORCPT ); Tue, 16 Feb 2010 16:41:44 -0500 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=date:from:x-x-sender:to:cc:subject:in-reply-to:message-id: references:user-agent:mime-version:content-type:x-system-of-record; b=xaWiuoe5hIY5MWhEK/sQQpodMw1pJ16KJQq5FhnggSL+8IAgY9UqefPQBZSwfxTbf eQoqyPYGrDSceW+/8UaJg== Date: Tue, 16 Feb 2010 13:41:33 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Minchan Kim cc: Rik van Riel , Andrew Morton , KAMEZAWA Hiroyuki , Nick Piggin , Andrea Arcangeli , Balbir Singh , Lubos Lunak , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch 4/7 -mm] oom: badness heuristic rewrite In-Reply-To: <1266326086.1709.50.camel@barrios-desktop> Message-ID: References: <4B73833D.5070008@redhat.com> <1265982984.6207.29.camel@barrios-desktop> <28c262361002121845w459d0fa0l55a58552c3a6081e@mail.gmail.com> <1266326086.1709.50.camel@barrios-desktop> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4675 Lines: 98 On Tue, 16 Feb 2010, Minchan Kim wrote: > > Again, I'd encourage you to look at this as only a slight penalization > > rather than a policy that strictly needs to be enforced. If it were > > strictly enforced, it would be a prerequisite for selection if such a task > > were to exist; in my implementation, it is part of the heuristic. > > Okay. I can think it of slight penalization in this patch. > But in current OOM logic, we try to kill child instead of forkbomb > itself. My concern was that. We still do with my rewrite, that is handled in oom_kill_process(). The forkbomb penalization takes place in badness(). > 1. Forkbomb A task makes 2000 children in a second. > 2. 2000 children has almost same memory usage. I know another factors > affect oom_score. but in here, I assume all of children have almost same > badness score. > 3. Your heuristic penalizes A task so it would be detected as forkbomb. > 4. So OOM killer select A task as bad task. > 5. oom_kill_process kills high badness one of children, _NOT_ task A > itself. Unfortunately high badness child doesn't has big memory usage > compared to sibling. It means sooner or later we would need OOM again. > Couple points: killing a task with a comparatively small rss and swap usage to the parent does not imply that we need the call the oom killer again later, killing the child will allow for future memory freeing that may be all that is necessary. If the parent continues to fork, that will continue to be an issue, but the constant killing of its children should allow the user to intervene without bring the system to a grinding halt. I'd strongly prefer to kill a child from a forkbombing task, however, than an innocent application that has been running for days or weeks only to find that the forkbombing parent will consume its memory as well and then need have its children killed. Secondly, the forkbomb detection does not simply require 2000 children to be forked in a second, it requires oom_forkbomb_thres children that have called execve(), i.e. they have seperate address spaces, to have a runtime of less than one second. > My point was 5. > > 1. oom_kill_process have to take a long time to scan tasklist for > selecting just one high badness task. Okay. It's right since OOM system > hang is much bad and it would be better to kill just first task(ie, > random one) in tasklist. > > 2. But in above scenario, sibling have almost same memory. So we would > need OOM again sooner or later and OOM logic could do above scenario > repeatably. > In Rik's web server example, this is the preferred outcome: kill a thread handling a single client connection rather than kill a "legitimate" forkbombing server to make the entire service unresponsive. > I said _BUGGY_ forkbomb task. That's because Rik's example isn't buggy > task. Administrator already knows apache can make many task in a second. > So he can handle it by your oom_forkbomb_thres knob. It's goal of your > knob. > We can't force all web servers to tune oom_forkbomb_thres. > So my suggestion is following as. > > I assume normal forkbomb tasks are handled well by admin who use your > oom_forkbom_thres. The remained problem is just BUGGY forkbomb process. > So if your logic selects same victim task as forkbomb by your heuristic > and it's 5th time continuously in 10 second, let's kill forkbomb instead > of child. > > tsk = select_victim_task(&cause); > if (tsk == last_victim_tsk && cause == BUGGY_FORKBOMB) > if (++count == 5 && time_since_first_detect_forkbomb <= 10*HZ) > kill(tsk); > else { > last_victim_tsk = NULL; count = 0; time_since... = 0; > kill(tsk's child); > } > > It's just example of my concern. It might never good solution. > What I mean is just whether we have to care this. > This unfairly penalizes tasks that have a large number of execve() children, we can't possibly know how to define BUGGY_FORKBOMB. In other words, a system-wide forkbombing policy in the oom killer will always have a chance of killing a legitimate task, such as a web server, that will be an undesired result. Setting the parent to OOM_DISABLE isn't really an option in this case since that value is inherited by children and would need to explicitly be cleared by each thread prior to execve(); this is one of the reasons why I proposed /proc/pid/oom_adj_child a few months ago, but it wasn't well received. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/