Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764212AbXLTPFZ (ORCPT ); Thu, 20 Dec 2007 10:05:25 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754930AbXLTPFL (ORCPT ); Thu, 20 Dec 2007 10:05:11 -0500 Received: from atrey.karlin.mff.cuni.cz ([195.113.31.123]:50513 "EHLO atrey.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756134AbXLTPFJ (ORCPT ); Thu, 20 Dec 2007 10:05:09 -0500 Date: Thu, 20 Dec 2007 16:04:53 +0100 From: Jan Kara To: Bj?rn Steinbrink Cc: Linus Torvalds , Krzysztof Oledzki , Andrew Morton , Linux Kernel Mailing List , Nick Piggin , Peter Zijlstra , Thomas Osterried , protasnb@gmail.com, bugme-daemon@bugzilla.kernel.org Subject: Re: [Bug 9182] Critical memory leak (dirty pages) Message-ID: <20071220150453.GB764@atrey.karlin.mff.cuni.cz> References: <20071215221935.306A5108068@picon.linux-foundation.org> <20071215203539.d6f71e96.akpm@linux-foundation.org> <20071216015112.d0ab08a1.akpm@linux-foundation.org> <20071220141217.GA4745@atjola.homenet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071220141217.GA4745@atjola.homenet> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3233 Lines: 75 > On 2007.12.19 09:44:50 -0800, Linus Torvalds wrote: > > > > > > On Sun, 16 Dec 2007, Krzysztof Oledzki wrote: > > > > > > I'll confirm this tomorrow but it seems that even switching to data=ordered > > > (AFAIK default o ext3) is indeed enough to cure this problem. > > > > Ok, do we actually have any ext3 expert following this? I have no idea > > about what the journalling code does, but I have painful memories of ext3 > > doing really odd buffer-head-based IO and totally bypassing all the normal > > page dirty logic. > > > > Judging by the symptoms (sorry for not following this well, it came up > > while I was mostly away travelling), something probably *does* clear the > > dirty bit on the pages, but the dirty *accounting* is not done properly, > > so the kernel keeps thinking it has dirty pages. > > > > Now, a simple "grep" shows that ext3 does not actually do any > > ClearPageDirty() or similar on its own, although maybe I missed some other > > subtle way this can happen. And the *normal* VFS routines that do > > ClearPageDirty should all be doing the proper accounting. > > > > So I see a couple of possible cases: > > > > - actually clearing the PG_dirty bit somehow, without doing the > > accounting. > > > > This looks very unlikely. PG_dirty is always cleared by some variant of > > "*ClearPageDirty()", and that bit definition isn't used for anything > > else in the whole kernel judging by "grep" (the page allocator tests > > the bit, that's it). > > OK, so I looked for PG_dirty anyway. > > In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers > bail out if the page is dirty. > > Then in 3e67c0987d7567ad666641164a153dca9a43b11d, Andrew fixed > truncate_complete_page, because it called cancel_dirty_page (and thus > cleared PG_dirty) after try_to_free_buffers was called via > do_invalidatepage. > > Now, if I'm not mistaken, we can end up as follows. > > truncate_complete_page() > cancel_dirty_page() // PG_dirty cleared, decr. dirty pages > do_invalidatepage() > ext3_invalidatepage() > journal_invalidatepage() > journal_unmap_buffer() > __dispose_buffer() > __journal_unfile_buffer() > __journal_temp_unlink_buffer() > mark_buffer_dirty(); // PG_dirty set, incr. dirty pages > > If journal_unmap_buffer then returns 0, try_to_free_buffers is not > called and neither is cancel_dirty_page, so the dirty pages accounting > is not decreased again. Yes, this can happen. The call to mark_buffer_dirty() is a fallout from journal_unfile_buffer() trying to sychronise JBD private dirty bit (jbddirty) with the standard dirty bit. We could actually clear the jbddirty bit before calling journal_unfile_buffer() so that this doesn't happen but since Linus changed remove_from_pagecache() to not care about redirtying the page I guess it's not needed any more... Honza -- Jan Kara SuSE CR Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/