From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [Bug 12579] ext4 filesystem hang
Date: Sat, 14 Feb 2009 14:10:04 +0530
Message-ID: <20090214084004.GC22585@skywalker>
References: <bug-12579-13602@http.bugzilla.kernel.org/> <20090213220606.AE8FC11D109@picon.linux-foundation.org> <20090214015018.GB26628@mini-me.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: bugme-daemon@bugzilla.kernel.org, linux-ext4@vger.kernel.org
To: Theodore Tso <tytso@mit.edu>
Content-Disposition: inline
In-Reply-To: <20090214015018.GB26628@mini-me.lan>
Sender: linux-ext4-owner@vger.kernel.org

On Fri, Feb 13, 2009 at 08:50:18PM -0500, Theodore Tso wrote:
> > Patch from Aneesh, un-whitespace-mangled.
> > 
> > Ted, can you push this out?  Works great.  :) We might want to ask
> > the other reporter of something similar (next-20090206: deadlock on
> > ext4) to test it too.  I'll ping him.
> 
> Do we completely understand the root cause, in terms of which commit
> broken the mm/page-writeback.c code we were depending on?  And if so,
> what of the code in mm/page-writeback.c?  Does anyone else use it?
> Can anyone sanely use it?

AFAIU we need the changes even for older kernels. The
reasoning is, with delayed allocation we cannot allow to retry with lower
page index in write_cache_pages. We do retry even in older version of
kernel. What made it so easy to reproduce it on later kernels is that
we were doing a retry even if nr_to_write was zero. This got fixed on
mainline by 3a4c6800f31ea8395628af5e7e490270ee5d0585. So with that
change we are logically back to 2.6.28 state, But still the possibility
of deadlock remain.

> 
> And am I right in assuming that this only applies to 2.6.29-rcX
> kernels, and is not needed for 2.6.28 or earlier kernels?

I guess the hang can happen on 2.6.28 or earlier kernels.

> 
> I hadn't yet pushed it out because I needed time to understand all of
> these issues, hence these questions....

-aneesh