2001-07-27 10:13:51

by Samuel Dupas

[permalink] [raw]
Subject: swap_free: swap-space map bad (entry 00000100)

Hi every body,

I have theses lines in /var/log/messages

Is it a kernel problem, a hardware problem ?

On the archives on mailling lists I found nothing interresting (I mean,
only the same question but no response)
Please help me.

It's on a Cobalt Raq4, 512 Mb RAM, kernel 2.2.16C27_III

The machine write theses lines for a week, but the system doesn't work
like usual (It's very slow).

Thanks for any advice.

(I'm not subscribed to the list, can you add my address in CC please ?)

/var/log/messages
--------------------------------------------------------------------
Jul 25 02:05:12 euro kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000114
Jul 25 02:05:12 euro kernel: current->tss.cr3 = 0f0be000, %%cr3 = 0f0be000

Jul 25 02:05:12 euro kernel: *pde = 00000000
Jul 25 02:05:12 euro kernel: Oops: 0000
Jul 25 02:05:12 euro kernel: CPU: 0
Jul 25 02:05:12 euro kernel: EIP: 0010:[try_to_free_buffers+18/136]
Jul 25 02:05:12 euro kernel: EFLAGS: 00010206
Jul 25 02:05:12 euro kernel: eax: 00000100 ebx: c055e360 ecx: 0001207c
edx: 00040000
Jul 25 02:05:12 euro kernel: esi: 00000100 edi: 00000100 ebp: c055e360
esp: da98be90
Jul 25 02:05:12 euro kernel: ds: 0018 es: 0018 ss: 0018
Jul 25 02:05:12 euro kernel: Process rsync (pid: 28186, process nr: 12,
stackpage=da98b000)
Jul 25 02:05:12 euro kernel: Stack: 00000006 00000013 c011c146 c055e360
da98a000 00000005 c0120faa 00000006
Jul 25 02:05:12 euro kernel: 00000013 da98a000 00000013 00000000
00004000 00000001 00000008 c0121110
Jul 25 02:05:12 euro kernel: 00000013 c01218d2 00000013 00003000
db2c6bb0 00000000 00004000 00490ad4
Jul 25 02:05:12 euro kernel: Call Trace: [shrink_mmap+218/304]
[do_try_to_free_pages+78/232] [try_to_free_pages+20/24]
[__get_free_pages+122/812] [try_
to_read_ahead+254/276] [try_to_read_ahead+47/276]
[do_generic_file_read+750/1508]
Jul 25 02:05:12 euro kernel: [generic_file_read+99/124]
[file_read_actor+0/80] [sys_read+174/196] [system_call+52/56]
Jul 25 02:05:12 euro kernel: Code: 8b 76 14 83 78 20 00 75 06 f6 40 18 46
74 0f 6a 00 e8 70 01
Jul 25 04:02:44 euro kernel: swap_duplicate at c01222f4: entry 00000100,
unused page
Jul 25 04:02:44 euro kernel: VM: killing process httpd
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_duplicate at c01222f4: entry 00000100,
unused page
Jul 25 04:02:44 euro kernel: VM: killing process httpd
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_duplicate at c01222f4: entry 00000100,
unused page
Jul 25 04:02:44 euro kernel: VM: killing process httpd
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_free: swap-space map bad (entry
00000100)
Jul 25 04:02:44 euro kernel: swap_duplicate at c01222f4: entry 00000100,
unused page
------------------------------------------------------------------------------------

Samuel Dupas


2001-07-27 15:24:48

by Samuel Dupas

[permalink] [raw]
Subject: Unable to handle kernel paging request at virtual address 3b617b05 ( was Re: swap_free: swap-space map bad (entry 00000100) )


Hi again,

I change the kernel (now 2.2.19) and I still have the same problem. It
begin by a "Unable to handle kernel paging request at virtual address"

And it strange, the machine rebooted by itself.
And after the reboot, the second disk was rebuilding (the machine is in
RAID mirroring)

Do you have an idea of the problem ?

Help would be very nice (The server is remote and I can't change hardware
easily, that's why before replacing the server I want to know what could
explain my problem)

Thanks
/var/log/messages
----------------------------------------------------------
Jul 27 15:50:20 euro kernel: Unable to handle kernel paging request at
virtual address 3b617b05
Jul 27 15:50:20 euro kernel: current->tss.cr3 = 10d4a000, %%cr3 = 10d4a000

Jul 27 15:50:20 euro kernel: *pde = 00000000
Jul 27 15:50:20 euro kernel: Oops: 0000
Jul 27 15:50:20 euro kernel: CPU: 0
Jul 27 15:50:20 euro kernel: EIP: 0010:[free_wait+62/120]
Jul 27 15:50:20 euro kernel: EFLAGS: 00010007
Jul 27 15:50:20 euro kernel: eax: 3b617b01 ebx: c64de280 ecx: ccef5d48
edx: 3b617b01
Jul 27 15:50:20 euro kernel: esi: c64de000 edi: c64de280 ebp: 00000207
esp: d1fd5eec
Jul 27 15:50:20 euro kernel: ds: 0018 es: 0018 ss: 0018
Jul 27 15:50:20 euro kernel: Process ab (pid: 3800, process nr: 348,
stackpage=d1fd5000)
Jul 27 15:50:20 euro kernel: Stack: 00100000 00000035 00000080 00000000
00000000 c012ec23 c64de000 00000000
Jul 27 15:50:20 euro kernel: 00000035 c012ebf4 00000020 00000080
d4f9fb00 00000000 d20d13c0 d1fd4000
Jul 27 15:50:20 euro kernel: fffffff7 d1fd4000 d1fd4000 00000bac
00000000 c64de000 08050960 0000004e
Jul 27 15:50:20 euro kernel: Call Trace: [do_select+487/512]
[do_select+440/512] [sys_select+881/1176] [net_bh+398/488]
[common_interrupt+24/32] [syst
em_call+52/56]
Jul 27 15:50:20 euro kernel: Code: 8b 42 04 39 f8 75 f7 89 4a 04 55 9d 83
c4 f4 8b 43 fc 50 e8
Jul 27 15:57:07 euro kernel: klogd 1.3-3, log source = /proc/kmsg started.
------------------------------------------------------------

Samuel

2001-07-27 15:29:28

by Larry McVoy

[permalink] [raw]
Subject: Re: Unable to handle kernel paging request at virtual address 3b617b05 ( was Re: swap_free: swap-space map bad (entry 00000100) )

On Fri, Jul 27, 2001 at 04:24:23PM +0100, Samuel Dupas wrote:
> I change the kernel (now 2.2.19) and I still have the same problem. It
> begin by a "Unable to handle kernel paging request at virtual address"

Did you change memory or CPU lately? I had a pile of these yesterday
after we dropped in a 1.3Ghz K7. Turned out our memory was border line,
I swapped in some new mem and no more problems.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-07-27 15:37:38

by Samuel Dupas

[permalink] [raw]
Subject: Re: Unable to handle kernel paging request at virtual address 3b617b05 ( was Re: swap_free: swap-space map bad (entry 00000100) )

On Fri, 27 Jul 2001 08:29:18 -0700
Larry McVoy <[email protected]> wrote:
> On Fri, Jul 27, 2001 at 04:24:23PM +0100, Samuel Dupas wrote:
> > I change the kernel (now 2.2.19) and I still have the same problem. It
> > begin by a "Unable to handle kernel paging request at virtual address"
>
> Did you change memory or CPU lately? I had a pile of these yesterday
> after we dropped in a 1.3Ghz K7. Turned out our memory was border line,
> I swapped in some new mem and no more problems.

No I didn't change the hardware. The server is installed in our
colocation-center for 3 weeks but was not really used so far.
I discover theses lines in the logs and in fact the server restarts itself
when it fails.
And as I am a bit far from the server, I can't test a hardware changement
yet.

I want to know if it's an hardware problem or not, and if it's the hard
disk, the memory etc...

Any Idea ?

Thanks

Samuel

2001-07-27 15:41:58

by Sunny Zhou

[permalink] [raw]
Subject: RE: Unable to handle kernel paging request at virtual address 3b617b05 (was Re: swap_free: swap-space map bad (entry 00000100) )

Yesterday I got the same problem. I believe your hard disk is SCSI
interface, and you are using 2.4.x kernel, right?
I reduced my swapping space to 600MB and it worked(at least
yesterday).
Don't know why.

Sunny


-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Samuel Dupas
Sent: Friday, July 27, 2001 10:24 AM
To: [email protected]
Subject: Unable to handle kernel paging request at virtual address
3b617b05 (was Re: swap_free: swap-space map bad (entry 00000100) )



Hi again,

I change the kernel (now 2.2.19) and I still have the same problem.
It
begin by a "Unable to handle kernel paging request at virtual
address"

And it strange, the machine rebooted by itself.
And after the reboot, the second disk was rebuilding (the machine is
in
RAID mirroring)

Do you have an idea of the problem ?

Help would be very nice (The server is remote and I can't change
hardware
easily, that's why before replacing the server I want to know what
could
explain my problem)

Thanks
/var/log/messages
----------------------------------------------------------
Jul 27 15:50:20 euro kernel: Unable to handle kernel paging request
at
virtual address 3b617b05
Jul 27 15:50:20 euro kernel: current->tss.cr3 = 10d4a000, %%cr3 =
10d4a000

Jul 27 15:50:20 euro kernel: *pde = 00000000
Jul 27 15:50:20 euro kernel: Oops: 0000
Jul 27 15:50:20 euro kernel: CPU: 0
Jul 27 15:50:20 euro kernel: EIP: 0010:[free_wait+62/120]
Jul 27 15:50:20 euro kernel: EFLAGS: 00010007
Jul 27 15:50:20 euro kernel: eax: 3b617b01 ebx: c64de280 ecx:
ccef5d48
edx: 3b617b01
Jul 27 15:50:20 euro kernel: esi: c64de000 edi: c64de280 ebp:
00000207
esp: d1fd5eec
Jul 27 15:50:20 euro kernel: ds: 0018 es: 0018 ss: 0018
Jul 27 15:50:20 euro kernel: Process ab (pid: 3800, process nr: 348,
stackpage=d1fd5000)
Jul 27 15:50:20 euro kernel: Stack: 00100000 00000035 00000080
00000000
00000000 c012ec23 c64de000 00000000
Jul 27 15:50:20 euro kernel: 00000035 c012ebf4 00000020
00000080
d4f9fb00 00000000 d20d13c0 d1fd4000
Jul 27 15:50:20 euro kernel: fffffff7 d1fd4000 d1fd4000
00000bac
00000000 c64de000 08050960 0000004e
Jul 27 15:50:20 euro kernel: Call Trace: [do_select+487/512]
[do_select+440/512] [sys_select+881/1176] [net_bh+398/488]
[common_interrupt+24/32] [syst
em_call+52/56]
Jul 27 15:50:20 euro kernel: Code: 8b 42 04 39 f8 75 f7 89 4a 04 55
9d 83
c4 f4 8b 43 fc 50 e8
Jul 27 15:57:07 euro kernel: klogd 1.3-3, log source = /proc/kmsg
started.
------------------------------------------------------------

Samuel

2001-07-27 15:55:58

by Samuel Dupas

[permalink] [raw]
Subject: Re: Unable to handle kernel paging request at virtual address 3b617b05 (was Re: swap_free: swap-space map bad (entry 00000100) )

On Fri, 27 Jul 2001 10:40:54 -0500
Sunny Zhou <[email protected]> wrote:
> Yesterday I got the same problem. I believe your hard disk is SCSI
> interface, and you are using 2.4.x kernel, right?


No, It's IDE disk, with software RAID, and the kernel is 2.2.19 (with some
patches to fill to a cobalt machine)

> I reduced my swapping space to 600MB and it worked(at least
> yesterday).
> Don't know why.

I think the problem might happen when I begin to use the swap. But it will
show that there are some problems on the disk.
For You, if it's a HDD problem, the "bad block" might be in the place of
the disk you are not using since you reduce youe partition.

But now, I don't know, Larry said that it might be the memory or the CPU,
now the HDD ...

There is not a specific event when the line "Unable to handle kernel
paging request at virtual address 3b617b05" is in the log ?

Thanks for your help

Samuel

2001-07-27 17:38:11

by Samuel Dupas

[permalink] [raw]
Subject: Re: swap_free: swap-space map bad (entry 00000100)

On Fri, 27 Jul 2001 12:29:18 -0500
"Jeremy Linton" <[email protected]> wrote:
> Did you do a 'swapoff' at some point before this?
>

No, I didn't change anything. I just put stress on it with ab to test the
machine but it felt down :-((

Others ideas ?

Thanks

Samuel

2001-07-27 19:18:20

by Rik van Riel

[permalink] [raw]
Subject: Re: swap_free: swap-space map bad (entry 00000100)

On Fri, 27 Jul 2001, Samuel Dupas wrote:
> On Fri, 27 Jul 2001 12:29:18 -0500
> "Jeremy Linton" <[email protected]> wrote:
> > Did you do a 'swapoff' at some point before this?
> >
>
> No, I didn't change anything. I just put stress on it with ab to
> test the machine but it felt down :-((
>
> Others ideas ?

The memory corruption you saw usually (almost always)
indicates a hardware problem. It may not have shown up
during normal usage because without ab your RAM has
more idle time and can keep up refreshing itself
easily.

Flakey mainboard chipsets could "forget" about such
things under heavy DMA load, or ... (who knows)

Setting the BIOS settings one notch more conservative
often fixes these marginal errors.

regards,

Rik
--
Executive summary of a recent Microsoft press release:
"we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-07-27 20:24:33

by Jeremy Linton

[permalink] [raw]
Subject: Re: swap_free: swap-space map bad (entry 00000100)

> On Fri, 27 Jul 2001, Samuel Dupas wrote:
> > On Fri, 27 Jul 2001 12:29:18 -0500
> > "Jeremy Linton" <[email protected]> wrote:
> > > Did you do a 'swapoff' at some point before this?
> > >
> >
> > No, I didn't change anything. I just put stress on it with ab to
> > test the machine but it felt down :-((
> >
> > Others ideas ?
>
> The memory corruption you saw usually (almost always)
> indicates a hardware problem. It may not have shown up
> during normal usage because without ab your RAM has
> more idle time and can keep up refreshing itself
> easily.

I asked about the swapoff path because I have a couple of stable MP boxes
that exhibit swap map corruption (similar to what he appears to be seeing)
during/after a swapoff operation in 2.4. I presented a patch a couple weeks
ago that fixes some of the problems that I am seeing. A quick look at 2.2.19
makes it look the same problems might exist there as well. If he isn't doing
swapoff then my fixes probably won't help him.



jlinton


2001-07-27 20:33:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: swap_free: swap-space map bad (entry 00000100)

In article <[email protected]>,
Samuel Dupas <[email protected]> wrote:
>
>Is it a kernel problem, a hardware problem ?

Could be either. However, there thing you quote looks like a traditional
one-bit error.

>Jul 25 02:05:12 euro kernel: Unable to handle kernel NULL pointer
>dereference at virtual address 00000114
>Jul 25 02:05:12 euro kernel: current->tss.cr3 = 0f0be000, %%cr3 = 0f0be000
>
>Jul 25 02:05:12 euro kernel: *pde = 00000000
>Jul 25 02:05:12 euro kernel: Oops: 0000
>Jul 25 02:05:12 euro kernel: CPU: 0
>Jul 25 02:05:12 euro kernel: EIP: 0010:[try_to_free_buffers+18/136]
>Jul 25 02:05:12 euro kernel: EFLAGS: 00010206
>Jul 25 02:05:12 euro kernel: eax: 00000100 ebx: c055e360 ecx: 0001207c edx: 00040000
>Jul 25 02:05:12 euro kernel: esi: 00000100 edi: 00000100 ebp: c055e360 esp: da98be90

%esi is supposed to contain a kernel pointer to the per-page buffer list
at this point.

However, it contains the value 0x00000100, which is not a valid kernel
pointer, so dereferencing it (with an offset of 20, which is why you see
the virtual address 0x00000114) will cause an oops.

Now, I suspect that the value it _should_ contain is just zero. We
probably have the case that "page->buffers" should have been NULL (no
buffers allocated at all), but a one-bit error has turned it into
0x00000100, and then the page freeing logic will try to free the
"buffers" associated with the page.

And obviously, since "page->buffers" was bogus, when it tries to do

if (buffer_busy(tmp))

it will oops.

Now, that one-bit error could easily have come from a software source
too, of course. It might not be your RAM. But it's not as if you're
running an experimental kernel or anything like that..

And if you've also seen a bad page table entry 00000100, it _really_
starts to sound like one bit of your memory is stuck on. Run a memory
tester.

NOTE: hard errors are quite uncommon. It's more likely that you have a
bit (or a row) that has soft-errors: it doesn't necessarily show up
every time, but shows up under heavy memory activity when the RAM chip
or the machine starts heating up.. The fact that this happens when
swapping may be indicative not so much of swapping problems per se, but
just the fact that that's when your machine is under the most load.

Linus

2001-07-28 01:54:48

by Josh Wyatt

[permalink] [raw]
Subject: Thanks

Hi All,

I don't know how often you guys hear this, or even if it's appropriate
for this list, but I just wanted to pass along a "thank you" in
appreciation for the fine work you've all done to make Linux what it is.

I've just joined the list a few days ago, and even though I feel as
though I've always had a good understanding of kernel internals (as it
pertains to an admin person, rather than a developer), I am very
impressed with the knowledge transferred by you guys back and forth. I
have monitored other open source projects casually in the past, but have
never seen this level of enthusiasm and love for the art.

As an individual that supports the movement professionally,
philosophically, and personally, I applaud and appreciate the effort and
the outcome.

Thanks and keep up the good work.

Yours,
Josh Wyatt
Senior Unix Engineer,
HCS Systems, Incorporated
http://www.hcssystems.com


2001-07-28 02:20:34

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Thanks

On Fri, 27 Jul 2001 21:53:54 -0400
Josh Wyatt <[email protected]> wrote:

JW> Hi All,

JW> I don't know how often you guys hear this, or even if it's appropriate
JW> for this list, but I just wanted to pass along a "thank you" in
JW> appreciation for the fine work you've all done to make Linux what it is.

I think they hear it very often, but any letter of that kind gives only good and positive
sense for really great developers, that creating Linux kernel every day.

JW> I've just joined the list a few days ago, and even though I feel as
JW> though I've always had a good understanding of kernel internals (as it
JW> pertains to an admin person, rather than a developer), I am very
JW> impressed with the knowledge transferred by you guys back and forth. I
JW> have monitored other open source projects casually in the past, but have
JW> never seen this level of enthusiasm and love for the art.

I joined this list about a month and large part of messages can't understand, but there talk
really clever developers and new knowledge flows in my brain with almost every new letter.
Looks only at discussion about ext3, much many interesting thinks one can get from this thread.

JW> As an individual that supports the movement professionally,
JW> philosophically, and personally, I applaud and appreciate the effort and
JW> the outcome.

Let us wish all Linux developers to continue this exellent work.

JW> Thanks and keep up the good work.

JW> Yours,
JW> Josh Wyatt
JW> Senior Unix Engineer,
JW> HCS Systems, Incorporated
JW> http://www.hcssystems.com

---
WBR. //s0mbre