Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754038AbbERPqh (ORCPT ); Mon, 18 May 2015 11:46:37 -0400 Received: from imap.thunk.org ([74.207.234.97]:42745 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753945AbbERPq1 (ORCPT ); Mon, 18 May 2015 11:46:27 -0400 Date: Mon, 18 May 2015 11:46:16 -0400 From: "Theodore Ts'o" To: Nikolay Borisov Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [Ext4][Bug] Deadlock in ext4 with memcg enabled. Message-ID: <20150518154616.GC4180@thunk.org> Mail-Followup-To: Theodore Ts'o , Nikolay Borisov , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org References: <5559965B.5080006@kyup.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5559965B.5080006@kyup.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1899 Lines: 38 On Mon, May 18, 2015 at 10:35:55AM +0300, Nikolay Borisov wrote: > The conclusion that I've drawn looking from the code and some offline > discussions is that when fsync is requested ext4 starts marking pages > for writeback (ext4_writepages). I think some heavy inlining is > happening and ext4_map_blocks is being called from: > > ext4_writepages->mpage_map_and_submit_extent -> mpage_map_one_extent -> > ext4_map_blocks > > which in turn when trying to write the pages exceeds the memory cgroup > limit which triggers the memory freeing logic. This, in turn, executes > the wait_on_page_writeback(page) in shrink_page_list. E.g. the the memcg > sees a page as being marked for writeback (presumably this is the same > page which caused the OOM) so it sleeps to wait for the page to be > written back, but since it is the writeback path that executed the page > shrinking it causes a deadlock. > > This deadlock then causes other processes on the system to enter D > state, waiting on trying to acquire a certain inode->i_mutex. What *should* be happening is that the memory allocations taking place in find_or_create_page when called by grow_dev_page() should be done with GFP_NOFS (i.e., the __GFP_FS flag should be masked out). I think you're right, but I view this as a mm bug; the memory allocation should have been properly executed with GFP_NOFS so the memory allocator should know that it can't recurse into page cleaner. In this case, it looks like it's not doing this, but it is trying to wait for a page to be cleaned, which is just as bad. Have you checked to see if this problem has fixed in newer kernels? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/