Date: Mon, 14 Mar 2005 15:11:42 -0800
From: "David S. Miller" <davem@davemloft.net>
To: "David S. Miller" <davem@davemloft.net>
Cc: tony.luck@intel.com, linux-kernel@vger.kernel.org,
       linux-ia64@vger.kernel.org, hugh@veritas.com
Subject: Re: bad pgd/pmd in latest BK on ia64
Message-Id: <20050314151142.716903cb.davem@davemloft.net>
In-Reply-To: <20050314143442.2ab086c9.davem@davemloft.net>
References: <B8E391BBE9FE384DAA4C5C003888BE6F031272AF@scsmsx401.amr.corp.intel.com>
	<20050314143442.2ab086c9.davem@davemloft.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1770
Lines: 47

On Mon, 14 Mar 2005 14:34:42 -0800
"David S. Miller" <davem@davemloft.net> wrote:

> On Mon, 14 Mar 2005 14:06:09 -0800
> "Luck, Tony" <tony.luck@intel.com> wrote:
> 
> > Trying to boot a build of the latest BK on ia64 I see
> > a series of messages like this:
> > 
> > mm/memory.c:99: bad pgd e0000001feba4000.
> > mm/memory.c:99: bad pgd e0000001febac000.
> > mm/memory.c:99: bad pgd e0000001febc0d10.
> 
> Things are similarly busted on sparc64 for me as well.
> Things instantly reboot right after the kernel tries
> to open an initial console.

As a followup, when I get an instant reboot like this
it usually means that some loop walking over memory
doesn't terminate properly.  Once the first access to
bogus I/O addresses (past the end of physical RAM)
happens, the machine soft reboots.

I therefore suspect the pgwalk patches.

One thing to note on sparc64 (I'm not sure on ia64) is
that the address passed into handle_mm_fault() can have
non-PAGE_MASK bits set in it (these are state bits from
the MMU miss handlers).

Does ia64 cause something similar to happen?

This never caused problems before, but it may be causing
troubles with the new pgwalk macros.  For example, the
new do { } while() loops test for exactness in the loop
termination test.  If there are low bits set in "addr",
we'll walk right past "end" in the loops and go on like
that forever.

I cannot, however, yet see a path where the handle_mm_fault()
address gets passed into the new pgwalk macro loops.  That
is what I'm searching for now :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/