2002-10-15 11:41:55

by David Coulson

[permalink] [raw]
Subject: swap_dup/swap_free errors with 2.4.20-pre10

I'm running 2.4.20-pre10 on a Dual PIII system with 2Gb of RAM and three
2Gb swap logical volumes.

It runs fine for a while, then I get lots of;

Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020

The address is always 00000020. I've tried the machine without any swap
space, and I get exactly the same error, so I'm assuming it's either bad
RAM or a kernel issue. I ran memtest86 on it yesterday, and it didn't
throw up any errors, but I'm going to swap the RAM out and see if that
fixes it.

Thanks,
David

--
David Coulson http://davidcoulson.net/
[email protected] http://journal.davidcoulson.net/


2002-10-16 21:54:45

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: swap_dup/swap_free errors with 2.4.20-pre10



On Tue, 15 Oct 2002, David Coulson wrote:

> I'm running 2.4.20-pre10 on a Dual PIII system with 2Gb of RAM and three
> 2Gb swap logical volumes.
>
> It runs fine for a while, then I get lots of;
>
> Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_dup: Bad swap file entry 00000020
> Oct 15 12:41:31 maeve kernel: swap_free: Bad swap file entry 00000020
>
> The address is always 00000020. I've tried the machine without any swap
> space, and I get exactly the same error, so I'm assuming it's either bad
> RAM or a kernel issue. I ran memtest86 on it yesterday, and it didn't
> throw up any errors, but I'm going to swap the RAM out and see if that
> fixes it.

Any news on this one, David?

2002-10-16 22:08:59

by David Coulson

[permalink] [raw]
Subject: Re: swap_dup/swap_free errors with 2.4.20-pre10

Hey Marcelo,

> Any news on this one, David?

Sorry, I forgot to follow up - I spend most of yesterday morning trying
to stabalise the thing and didn't end up posting my results. Basically,
my board can only handle 1.5Gb of PC133 properly, even though it will
try to use 2Gb if you put that much in it. Interestingly enough, it
passed the memtest86 tests I ran on it the other night, so I'm not sure
what's going on there. Tyan, the board manufacturer, confirmed that the
system is only stable with 1.5Gb of PC133, which is somewhat
disappointing, but I guess I'll have to live. Interestingly, it ran fine
for about 8hrs before going funny the first time, then it would spit out
swap_dup/free errors within 30 to 60mins five times in a row.

I had weird lockups under 2.4.20-pre9, where the system would behave
oddly - Most commands would work, but 'ps' simply locked up and I
couldn't Ctrl-C out of it. I've moved back to 2.4.19-ck7-rmap, which
seems to be stable at the moment, although I may take another look at
the 2.4.20-pre10 kernel sometime. As always, I didn't have a keyboard or
monitor hooked up to it, so I couldn't do too much with sysrq, but I'll
be ready if it does it again.

Any points about the above lock-ups would be useful - Since 'ps' locks
and others don't (e.g. 'ls'), I think it's quite a specific issue,
although I've so far been unable to track it down.

Thanks,
David

--
David Coulson http://davidcoulson.net/
[email protected] http://journal.davidcoulson.net/

2002-10-17 00:52:01

by Jeff Dike

[permalink] [raw]
Subject: Re: [uml-devel] Re: swap_dup/swap_free errors with 2.4.20-pre10

[email protected] said:
> I had weird lockups under 2.4.20-pre9, where the system would behave
> oddly - Most commands would work, but 'ps' simply locked up and I
> couldn't Ctrl-C out of it.

I've seen this bug multiple times. Basically, something is holding a
mm_sem and not letting go. Anything that walks the process list hangs.
Ultimately, this hangs anything that's remotely useful, and you have to
crash the box.

I've seen it on my laptop several times, and it hung a UML server that we
have. UML is frequently, but not always involved.

We got a sysrq t from the UML server. I posted to lkml about it, with no
response. You can see that at
http://marc.theaimsgroup.com/?l=linux-kernel&m=103351640614665&w=2

One factoid that I forgot to mention there is that when it happens on my
laptop, the disk activity light is stuck on.

Jeff