Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757168Ab2BWXRk (ORCPT ); Thu, 23 Feb 2012 18:17:40 -0500 Received: from mail-pw0-f46.google.com ([209.85.160.46]:41809 "EHLO mail-pw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756089Ab2BWXRi (ORCPT ); Thu, 23 Feb 2012 18:17:38 -0500 Authentication-Results: mr.google.com; spf=pass (google.com: domain of rientjes@google.com designates 10.68.219.70 as permitted sender) smtp.mail=rientjes@google.com; dkim=pass header.i=rientjes@google.com Date: Thu, 23 Feb 2012 15:17:35 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Rafael Aquini cc: linux-mm@kvack.org, Randy Dunlap , Christoph Lameter , Pekka Enberg , Matt Mackall , Rik van Riel , Josef Bacik , linux-kernel@vger.kernel.org Subject: Re: [PATCH] oom: add sysctl to enable slab memory dump In-Reply-To: <20120223152226.GA2014@x61.redhat.com> Message-ID: References: <20120222115320.GA3107@x61.redhat.com> <20120223152226.GA2014@x61.redhat.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3188 Lines: 61 On Thu, 23 Feb 2012, Rafael Aquini wrote: > Lets say the slab gets so bloated that for every user task spawned OOM-killer > just kills it instantly, or the system falls under severe thrashing, leaving no > chance for you getting an interactive session to parse /proc/slabinfo, thus > making the reset button as your only escape... How would you identify what was > the set of caches responsible by the slab swelling? > I think you misunderstand completely how the oom killer works, unfortunately. It, by default unless you have changed oom_score_adj tunables, kills the most memory-hogging eligible thread possible. That certainly wouldn't be a freshly forked user task prior to execve() unless you've enabled /proc/sys/vm/oom_kill_allocating_task, which you shouldn't unless you're running on a machine with 1k cores, for example. It would be existing thread that was using a lot of memory to allow for things EXACTLY LIKE forking additional user tasks. We don't want to get into a self-imposed DoS because something is oom and the oom killer does quite a good job at ensuring it doesn't. The goal is to kill a single thread to free the most amount of memory possible. If this is what is affecting you, then you'll need to figure out why you have changed the oom killer priority in such a way to do so: check your /proc/pid/oom_score_adj values that you have set in a way that when they are inherited they will instantly kill the child because it will quickly use more memory than the parent. > IMHO, having such qualified info about slab usage at hand is very useful in > several occurrences of OOM. It not only helps out developers, but also sysadmins > on troubleshooting slab usage when OOM-killer is invoked, thus qualifying and > showing such data surely does make sense for a lot of people. For those who do > not mind/care about such reporting, in the end it just takes a sysctl knob > adjustment to make it go quiet. > cat /proc/slabinfo > > I think this also gives another usecase for a possible /dev/mem_notify in > > the future: userspace could easily poll on an eventfd and wait for an oom > > to occur and then cat /proc/slabinfo to attain all this. In other words, > > if we had this functionality (which I think we undoubtedly will in the > > future), this patch would be obsoleted. > > Great! So, why not letting the time tell us if this feature will be obsoleted > or not? I'd rather have this patch obsoleted by another one proven better, than > just stay still waiting for something that might, or might not, happen in the > future. > Because (1) you're adding a sysctl that we don't want to obsolete and remove from the kernel that someone will come to depend on and then have to find an alternative solution like /dev/mem_notify, and (2) people parse messages like this that are emitted to the kernel log that we don't want to break in the future. So NACK on this approach. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/