Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966394Ab2FBHUl (ORCPT ); Sat, 2 Jun 2012 03:20:41 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:48261 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966371Ab2FBHUk (ORCPT ); Sat, 2 Jun 2012 03:20:40 -0400 Date: Sat, 2 Jun 2012 00:20:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2693 Lines: 67 On Fri, 1 Jun 2012, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins wrote: > > > > Move the lock after the loop, I think you meant. > > Well, I wasn't sure if anything inside the loop might need it. I don't > *think* so, but at the same time, what protects "page_order(page)" > (or, indeed PageBuddy()) from being stable while that loop content > uses them? Yes, I believe you're right, page_order(page) could supply nonsense if it's not stabilized under zone->lock along with PageBuddy(page). Though if this rescue_unmovable_pageblock() is just best-effort, with a little more care we can probably avoid the lock in there. > > I don't understand that code at all. It does that crazy iteration over > page, and changes "page" in random ways, I don't think they're random ways: when buddy it uses the order to skip that block, otherwise it goes page by page, considering a free (I guess on pcp) page or an lru page as good for movable. > and then finishes up with a > totally new "page" value that is some random thing that is *after* the > end_page thing. WHAT? > > The code makes no sense. It tests all those pages within the > page-block, but then after it has done all those tests, it does the > final > > set_pageblock_migratetype(..) > move_freepages_block(..) > > using a page that is *beyond* the pageblock (and with the whole > page_order() thing, who knows just how far beyond it?) I totally missed that, thank goodness you did not. Yes, it's rubbish. It goes to this effort to find a suitable pageblock, then chooses the next one instead (or possibly another). Perhaps it would get even better results using a random number generator in there. > > It looks entirely too much like random-monkey code to me. Presumably it should be passing start_page instead of page to set_pageblock_migratetype() and move_freepages_block(). But this does seem to be code of the kind, that the longer you look at it, the more bugs you find. And I worry about what trouble it might then cause, if it actually started to work in the way it was intending. I don't think fixing it up is wise for -rc1. Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199 ("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks") appears to revert cleanly, and I'm running with it reverted now. I'm not saying it shouldn't come back later, but does anyone see an argument against reverting it now? Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/