From: Theodore Ts'o Subject: Re: Memory allocation can cause ext4 filesystem to be remounted r/o Date: Wed, 26 Jun 2013 12:34:50 -0400 Message-ID: <20130626163450.GA2487@thunk.org> References: <20130626140205.GE3875@thunk.org> <20130626145417.GB32092@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Vikram MP , linux-ext4@vger.kernel.org To: Nagachandra P Return-path: Received: from li9-11.members.linode.com ([67.18.176.11]:33147 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752222Ab3FZQey (ORCPT ); Wed, 26 Jun 2013 12:34:54 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: On Wed, Jun 26, 2013 at 08:50:50PM +0530, Nagachandra P wrote: > > We also have seen case where the current allocation itself could cause > the lowmem shrinker to be called (which in-turn chooses the same > process for killing because of oom_adj_value of the current process, > oom_adj_value is a weight age value associated with each process based > on which the android low memory killer would select a process for > killing to get memory). If we chose to retry in such case we could end > up in endless loop of retrying the allocation. It would be better to > handle this without retrying. The challenge is that in some cases there's no good way to return an error back upwards, and in other cases, the ability to back out of the middle of a file system operation is incredibly hard. This is why we have the retry loop in the jbd2 code; the presumption is that some other process is scheduable, so that allows other processes to exit when the OOM killer takes out other processes. It's not an ideal solution, but in practice it's been good enough. In general the OOM killer will be able to take out some other process and free up memory that way. Are you seeing this a lot? If so, I think it's fair to ask why; from what I can tell it's not a situation that is happening often on most systems using ext4 (including Android devices, of which I have several). > We could your above suggestion which could address this specific path. > But, there are quiet a number of allocation in ext4 which could call > ext4_std_error on failure and we may need to look each one of them to > see on how do we handle each one of them. Do think this something that > could be done? There aren't that many places where ext4 does memory allocations, actually. And once you exclude those which are used when the file system is initially mounted, there is quite a manageable number. It's probably better to audit all of those and to make sure we have a good error recovery if any of these calls to kmalloc() or kmem_cache_alloc() fail. In many of the cases where we end up calling ext4_std_error(), the most common cause of is an I/O error while trying to read some critical metadata block, and in that case, declaring that the file system is corrupted is in fact the appropriate thing to do. > We have in the past tried some ugly hacks to workaround the problem > (by adjusting oom_adj_values, guarding them from being killed) but > they don't seem provide fool proof mechanism at high memory pressure > environment. Any advice on what we could try to fix the issue in > general would be appreciated? What version of the kernel are using? And do you understand why you are under so much memory pressure? Is it due to applications not getting killed quickly enough? Are applications dirtying too much memory too quickly? Is write throttling not working? Or are they allocating too much memory when they start up their JVM? Or is it just that your Android device has far less memory than most of the other devices out there? Speaking generally, if you're regularly seeing that kmem_cache_alloc failing, that means free memory has fallen to zero. Which to me sounds like the OOM killer should be trying to kill processes more aggressively, and more generally you should be trying to be trying to make sure the kernel is maintaining a somewhat larger amount of free memory. The fact that you mentioned trying to prevent certain processes from being killed may mean that you are approaching this problem from the wrong direction. It may be more fruitful to encourage the system to kill those user applications that most deserving _earlier_. Regards, - Ted