Trying to boot a build of the latest BK on ia64 I see
a series of messages like this:
mm/memory.c:99: bad pgd e0000001feba4000.
mm/memory.c:99: bad pgd e0000001febac000.
mm/memory.c:99: bad pgd e0000001febc0d10.
mm/memory.c:105: bad pmd f000eef3f0000200.
mm/memory.c:105: bad pmd f000eef3f000e2c3.
mm/memory.c:105: bad pmd f000ff54f000eef3.
mm/memory.c:105: bad pmd f000292cf0002984.
before the kernel gets an OOPS on a deref NULL
at resched_task+0x41/0x1a0.
2.6.11-bk9 boots ok, so this was added recently.
-Tony
On Mon, 14 Mar 2005 14:06:09 -0800
"Luck, Tony" <[email protected]> wrote:
> Trying to boot a build of the latest BK on ia64 I see
> a series of messages like this:
>
> mm/memory.c:99: bad pgd e0000001feba4000.
> mm/memory.c:99: bad pgd e0000001febac000.
> mm/memory.c:99: bad pgd e0000001febc0d10.
Things are similarly busted on sparc64 for me as well.
Things instantly reboot right after the kernel tries
to open an initial console.
On Mon, 14 Mar 2005 14:34:42 -0800
"David S. Miller" <[email protected]> wrote:
> On Mon, 14 Mar 2005 14:06:09 -0800
> "Luck, Tony" <[email protected]> wrote:
>
> > Trying to boot a build of the latest BK on ia64 I see
> > a series of messages like this:
> >
> > mm/memory.c:99: bad pgd e0000001feba4000.
> > mm/memory.c:99: bad pgd e0000001febac000.
> > mm/memory.c:99: bad pgd e0000001febc0d10.
>
> Things are similarly busted on sparc64 for me as well.
> Things instantly reboot right after the kernel tries
> to open an initial console.
As a followup, when I get an instant reboot like this
it usually means that some loop walking over memory
doesn't terminate properly. Once the first access to
bogus I/O addresses (past the end of physical RAM)
happens, the machine soft reboots.
I therefore suspect the pgwalk patches.
One thing to note on sparc64 (I'm not sure on ia64) is
that the address passed into handle_mm_fault() can have
non-PAGE_MASK bits set in it (these are state bits from
the MMU miss handlers).
Does ia64 cause something similar to happen?
This never caused problems before, but it may be causing
troubles with the new pgwalk macros. For example, the
new do { } while() loops test for exactness in the loop
termination test. If there are low bits set in "addr",
we'll walk right past "end" in the loops and go on like
that forever.
I cannot, however, yet see a path where the handle_mm_fault()
address gets passed into the new pgwalk macro loops. That
is what I'm searching for now :-)
On Mon, 14 Mar 2005 15:11:42 -0800
"David S. Miller" <[email protected]> wrote:
> I therefore suspect the pgwalk patches.
I just noticed something else while reviewing this stuff.
The PTRS_PER_PMD macros aren't used anymore, so my hacks
to get 32-bit process VM operations optimized on sparc64
aren't even being used any more, ho hum... :-) There are
better ways to do this.
(For the interested, see {REAL_}PTRS_PER_PMD in
include/asm-sparc64/pgtable.h)
Come to think of it, this may be related somehow to whatever
is causing the problems.
On Mon, 2005-03-14 at 15:31 -0800, David S. Miller wrote:
> On Mon, 14 Mar 2005 15:11:42 -0800
> "David S. Miller" <[email protected]> wrote:
>
> > I therefore suspect the pgwalk patches.
>
> I just noticed something else while reviewing this stuff.
> The PTRS_PER_PMD macros aren't used anymore, so my hacks
> to get 32-bit process VM operations optimized on sparc64
> aren't even being used any more, ho hum... :-) There are
> better ways to do this.
>
> (For the interested, see {REAL_}PTRS_PER_PMD in
> include/asm-sparc64/pgtable.h)
>
> Come to think of it, this may be related somehow to whatever
> is causing the problems.
That reminds me ... I still itend to toy with your old patches and add
some more abstract walkers & bitmap stuffs. Just no time at the moment.
The main thing I want to change from your approach is instead of calling
a pte_work callback for every pte, call it for ranges of PTEs (that is
PTE pages most of the time). The goal here is to avoid the overhead of
the indirect function call (& additional stackframe junk etc...) on
every single PTE.
Ben.
On Mon, Mar 14, 2005 at 02:34:42PM -0800, David S. Miller wrote:
> On Mon, 14 Mar 2005 14:06:09 -0800
> "Luck, Tony" <[email protected]> wrote:
>
> > Trying to boot a build of the latest BK on ia64 I see
> > a series of messages like this:
> >
> > mm/memory.c:99: bad pgd e0000001feba4000.
> > mm/memory.c:99: bad pgd e0000001febac000.
> > mm/memory.c:99: bad pgd e0000001febc0d10.
>
> Things are similarly busted on sparc64 for me as well.
> Things instantly reboot right after the kernel tries
> to open an initial console.
It's also busted on ia64 in 2.6.11-mm3 if that narrows thing down.
mh
--
Martin Hicks || Silicon Graphics Inc. || [email protected]
On Tue, 15 Mar 2005 08:24:58 -0500
Martin Hicks <[email protected]> wrote:
> It's also busted on ia64 in 2.6.11-mm3 if that narrows thing down.
Not necessary, we found the problem and the fix is in Linus's
tree. The clear_page_range() had to be restored to using
pgd_index() looping at the top level.