2002-07-22 19:44:08

by Paul Larson

[permalink] [raw]
Subject: [OOPS] 2.5.27 - __free_pages_ok()

Encountered this first with Linux-2.5.25+rmap and it looks like the
problem also slipped into 2.5.27. The same machine boots fine with a
vanilla 2.5.25 or 2.5.26, but gets this on boot with rmap. The machine
is an 8-way PIII-700.

# free
total used free shared buffers
cached
Mem: 3871360 464796 3406564 0 110280
110356
-/+ buffers/cache: 244160 3627200
Swap: 15719284 0 15719284

-Paul Larson


Attachments:
oops.out (16.77 kB)

2002-07-22 20:02:25

by Rik van Riel

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On 22 Jul 2002, Paul Larson wrote:

> Encountered this first with Linux-2.5.25+rmap and it looks like the
> problem also slipped into 2.5.27. The same machine boots fine with a
> vanilla 2.5.25 or 2.5.26, but gets this on boot with rmap. The machine
> is an 8-way PIII-700.

Bill Irwin has told me about a rare bug with exec() mapping
garbage into the address space of a process, which might
trigger this bug check the next time that process exec()s.

I've gotten two reports of this bug now, but have no idea
what particular combination of hardware / compiler / config
triggers the bug. The rmap code seems to have survived akpm's
stress tests so it's probably not a simple bug to track down ;/

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/


2002-07-22 20:02:07

by Rik van Riel

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 22 Jul 2002, Rik van Riel wrote:

> I've gotten two reports of this bug now, but have no idea
> what particular combination of hardware / compiler / config
> triggers the bug. The rmap code seems to have survived akpm's
> stress tests so it's probably not a simple bug to track down ;/

Now that I think about it, could you try enabling RMAP_DEBUG
in mm/rmap.c and try triggering this bug again ?

It's quite possible the debugging code in page_remove_rmap()
will show us a hint...

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-07-22 20:17:24

by Dave Hansen

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

Paul Larson wrote:
> Encountered this first with Linux-2.5.25+rmap and it looks like the
> problem also slipped into 2.5.27. The same machine boots fine with a
> vanilla 2.5.25 or 2.5.26, but gets this on boot with rmap. The machine
> is an 8-way PIII-700.

I was hitting the same thing on a Netfinity 8500R/x370. The problem
was an old compiler (egcs 2.91-something). It was triggered by a few
different things, including kernprof and dcache_rcu.

--
Dave Hansen
[email protected]

2002-07-22 22:44:01

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 2002-07-22 at 15:05, Rik van Riel wrote:
> Now that I think about it, could you try enabling RMAP_DEBUG
> in mm/rmap.c and try triggering this bug again ?
Done, output attached below.

On Mon, 2002-07-22 at 15:19, Dave Hansen wrote:
> I was hitting the same thing on a Netfinity 8500R/x370. The problem
> was an old compiler (egcs 2.91-something). It was triggered by a few
> different things, including kernprof and dcache_rcu.
Well, it was a redhat box. Just to be certain, I made sure to use kgcc
and it still hung on boot, but kgcc is egcs-2.91.66 19990314/Linux
(egcs-1.1.2 release). If it would be helpful, I'll try compiling my
kernel on a debian box tomorrow and booting with that.

Thanks,
Paul Larson

2002-07-22 22:47:32

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

New and improved version with the attachment this time.



Attachments:
rmap2.out (3.05 kB)

2002-07-22 22:49:53

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 2002-07-22 at 15:19, Dave Hansen wrote:
>> I was hitting the same thing on a Netfinity 8500R/x370. The problem
>> was an old compiler (egcs 2.91-something). It was triggered by a few
>> different things, including kernprof and dcache_rcu.

On Mon, Jul 22, 2002 at 05:34:32PM -0500, Paul Larson wrote:
> Well, it was a redhat box. Just to be certain, I made sure to use kgcc
> and it still hung on boot, but kgcc is egcs-2.91.66 19990314/Linux
> (egcs-1.1.2 release). If it would be helpful, I'll try compiling my
> kernel on a debian box tomorrow and booting with that.

ISTR this compiler having code generation problems. I think trying to
reproduce this with a working i386 compiler is in order, e.g. debian's
2.95.4 or some similarly stable version.


Cheers,
Bill

2002-07-22 23:02:31

by Alan

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

> and it still hung on boot, but kgcc is egcs-2.91.66 19990314/Linux
> (egcs-1.1.2 release). If it would be helpful, I'll try compiling my
> kernel on a debian box tomorrow and booting with that.

egcs-1.1.2 does have real problems with 2.5

7.1 errata/7.2/7.3 gcc 2.96 appear quite happy

2002-07-22 23:34:18

by Thunder from the hill

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

Hi,

On 23 Jul 2002, Alan Cox wrote:
> egcs-1.1.2 does have real problems with 2.5
>
> 7.1 errata/7.2/7.3 gcc 2.96 appear quite happy

So what compiler could I use on my sparc64? IIRC, the current gcc versions
failed to make up clean bytecode, and the older versions fail to deal with
newer code...

I've seen the gcc 3.1 test report from Dave Miller, and I knew it could be
nasty times if I try to get used to it...

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-23 11:44:49

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 2002-07-22 at 19:18, Alan Cox wrote:
> > and it still hung on boot, but kgcc is egcs-2.91.66 19990314/Linux
> > (egcs-1.1.2 release). If it would be helpful, I'll try compiling my
> > kernel on a debian box tomorrow and booting with that.
>
> egcs-1.1.2 does have real problems with 2.5
>
> 7.1 errata/7.2/7.3 gcc 2.96 appear quite happy
7.3 gcc 2.96 was the one I was originally using when I found this
problem. I decided to go back and try kgcc just in case. I'll try
compiling it on another machine and moving it over today.

-Paul Larson

2002-07-23 17:40:21

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 2002-07-22 at 17:52, William Lee Irwin III wrote:
> ISTR this compiler having code generation problems. I think trying to
> reproduce this with a working i386 compiler is in order, e.g. debian's
> 2.95.4 or some similarly stable version.
That's exactly the one I was planning on trying it with. Tried it this
morning with the same error. Three compilers later, I think this is
looking less like a compiler error. Any ideas?

-Paul Larson

2002-07-23 17:46:41

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Mon, 2002-07-22 at 17:52, William Lee Irwin III wrote:
>> ISTR this compiler having code generation problems. I think trying to
>> reproduce this with a working i386 compiler is in order, e.g. debian's
>> 2.95.4 or some similarly stable version.

On Tue, Jul 23, 2002 at 12:40:43PM -0500, Paul Larson wrote:
> That's exactly the one I was planning on trying it with. Tried it this
> morning with the same error. Three compilers later, I think this is
> looking less like a compiler error. Any ideas?

Stands a good chance of being fixed by the recent rmap.c bugfix posted
by Rik. I'm seeing deadlocks every other boot over here, the cause of
which I've not yet been able to discover.

Cheers,
Bill

2002-07-23 17:46:27

by Dave Hansen

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

Paul Larson wrote:
> On Mon, 2002-07-22 at 17:52, William Lee Irwin III wrote:
>
>>ISTR this compiler having code generation problems. I think trying to
>>reproduce this with a working i386 compiler is in order, e.g. debian's
>>2.95.4 or some similarly stable version.
>
> That's exactly the one I was planning on trying it with. Tried it this
> morning with the same error. Three compilers later, I think this is
> looking less like a compiler error. Any ideas?

Exactly _which_ 3 compilers? I couldn't do it with egcs, but Debian's
2.5.94 and 3.0 worked.

--
Dave Hansen
[email protected]

2002-07-23 19:01:04

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On Tue, 2002-07-23 at 12:49, William Lee Irwin III wrote:
> Stands a good chance of being fixed by the recent rmap.c bugfix posted
> by Rik. I'm seeing deadlocks every other boot over here, the cause of
> which I've not yet been able to discover.
Still broken with the rmap patch posted today. Output attached.

-Paul Larson


Attachments:
c5.out (17.58 kB)

2002-07-23 19:56:44

by Paul Larson

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

I was asking Dave McCracken and he mentioned that rmap and highmem pte
don't play nice together. I tried turning that off and it boots without
error now. Someone might want to take a look at getting those two to
work cleanly together especially now that rmap is in. But for now, this
will work around the problem.

Thanks,
Paul Larson

2002-07-23 19:58:52

by Rik van Riel

[permalink] [raw]
Subject: Re: [OOPS] 2.5.27 - __free_pages_ok()

On 23 Jul 2002, Paul Larson wrote:

> I was asking Dave McCracken and he mentioned that rmap and highmem pte
> don't play nice together. I tried turning that off and it boots without
> error now.

OK, good to hear that.

> Someone might want to take a look at getting those two to
> work cleanly together especially now that rmap is in.

William Irwin has been working on this for a few days now ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/