2001-04-10 16:12:15

by Marcin Kowalski

[permalink] [raw]
Subject: Kernel 2.4.3 Crash - (Kernel BUG at highmem.c:155)

Hi All

This is quite a long email which I have split in two for those that are
interested problem and background...

---Problem---

Kernel Panic Occured with Messages:
Kernel BUG at highmem.c:155
Invalid Operand ???? With sshd somewhere in the mix.

Unfortunately I did a task dump with SYSRQ before I could get the rest of the
info.. and syslogd had stopped logging to disk already. I then had to reboot.

Looking at line 155 :
/*
* A count must never go down to zero
* without a TLB flush!
*/
switch (--pkmap_count[nr]) {
case 0:
BUG();
case 1:
wake_up(&pkmap_map_wait);
}
spin_unlock(&kmap_lock);

WHat went wrong???? to make the count go to zero??

---Background----

I am running Linux 2.4.3 on A HP netserver 2000r, it has 1.2gigs of RAM, at
dual 933mhz Xeon (Piii actually, but paid for Xeons??) and a Netraid 4m SCSI
Card with 6x 18.4gig HP Drives in a Raid 5 Configuration with No Hot Standby.
The Root FS is on a 9.2 GIG HP Scsi Drive. Both root and home are reiserfs
(9.4 gig and 85gig respectively).
The kernel is patch with the axboe-scsi-patch and the latest aacraid patch.
Running under SUse Linux 7.0 (new modutils).

THe server is running samba, httpd, sendmail, mrtg, named and a number of
other porcesses but the loadaverage tends to stay below 1.0 mostly, although
it exhibits erratic behaviour with load climbing to 3-5-6 with top showing no
apparent candidate, with most of the time spent in SYStem calls.
Occassional lockups lasting 5-20 seconds were experienced when working on the
box under 2.4.2 but seem to be much better in 2.4.3.

Today the server tends to "eat up" shared+used memory over time eventually
using +- 700mb of RAM with no process reflecting this in top.

Running SWAPoff today, when 64mb of swap was being used, resulted in complete
machine lockup for about 30-40 seconds.

I strongly suspect the aacraid drivers but need further proof to convince the
powers that be to swap for a Mylex or something better supported....

Any advice/answers would be very welcome.

TIA
MARCin


--
-----------------------------
Marcin Kowalski
Linux/Perl Developer
Datrix Solutions
Cel. 082-400-7603
***Open Source Kicks Ass***
-----------------------------


2001-04-10 19:50:00

by Jeff Lessem

[permalink] [raw]
Subject: Re: Kernel 2.4.3 Crash - (Kernel BUG at highmem.c:155)

I also have seen the Kernel BUG at highmem.c:155 problem on a machine
I am testing. It is a Dell 8 processor P-III 700Mhz with 8GB of
memory and Linux 2.4.3 + a knfsd and quota patch for reiserfs. When
doing 5 simultaneous kernel compiles from another machine mounting the
8 processor one over nfs the 8 processor machine hung with an error
message somewhat like

nfsd: terminating on signal 2
kernel BUG at highmem.c: 155!
invalid operand: 0000
CPU: 6

I apologize for the nearly useless error information, but I am 5000
miles and 7 time zones away from this machine, so I have to depend on
others for getting me on console information until I can get it moved
over to a serial console.

>Occassional lockups lasting 5-20 seconds were experienced when working on the
>box under 2.4.2 but seem to be much better in 2.4.3.

The machine is also having these odd lockup problems under intense
disk IO, but I will detail that in another message (look for "kswapd,
kupdated, and bdflush at 99%").

Any advice to alleviate this problem would be appreciated, and I will
provide any more information I can upon request.

--
Thanks,
Jeff Lessem.