2004-03-22 22:34:30

by Sanjoy Mahajan

[permalink] [raw]
Subject: Thinkpad 560X w/ 160MB memory (2.4.24 kernel): many segfaults

I 'upgraded' my IBM Thinkpad 560X laptop to 160MB of RAM: 32MB on the
motherboard plus a 128MB EDO SODIMM. Even though IBM says that the
motherboard is certified only up to 96MB, it recognized the memory
fine on boot. Linux booted and noticed the 160MB without needing any
kernel command line options.

On the thinkpad mailing list (where I got the idea to upgrade) people
mentioned that Win2K on a 560X can deal with the extra memory, but for
Win9x one needs to change the video driver memory ranges (adding
0xF0000000 to the standard values). So I worried that X windows might
have a similar problem.

But everything worked fine at first, even X (I'm using XFree86 4.1.0).

However, I soon found that many programs would segfault for no obvious
reason. For example, the galeon browser and emacs (in X windows) died
on their own, whereas they almost never did before. That seemed
consistent with the worry about X.

To check that it wasn't the memory module itself, I ran the BIOS
memory test (which passed) and also memtest86+ (no errors on 4 passes,
which was more than 2 hours of testing).

For other reasons I was recompiling the kernel (2.4.24). Various
steps, all unrelated to X, failed as well: e.g. 'make menuconfig'. To
be sure that it wasn't due to X windows interacting badly with
something else, I switched to single user and reran the compile. Here
are the last few lines of a resulting logfile for 'make menuconfig':

make -C scripts/lxdialog all
make[1]: Entering directory `/usr/src/linux-2.4.24/scripts/lxdialog'
gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -DLOCALE
-DCURSES_LOC="<ncurses.h>" -c -o checklist.o checklist.c
gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -DLOCALE
-DCURSES_LOC="<ncurses.h>" -c -o menubox.o menubox.c
gcc: Internal compiler error: program cc1 got fatal signal 11
make[1]: *** [menubox.o] Error 1
make[1]: Leaving directory `/usr/src/linux-2.4.24/scripts/lxdialog'
make: *** [menuconfig] Error 2

I have the core file if it would be useful. It doesn't help much
because cc1 is compiled with optimization. Here is what gdb says upon
starting up with that core file:

Core was generated by `/usr/lib/gcc-lib/i386-linux/2.95.4/cc1
/tmp/ccoqJbGF.i -quiet -dumpbase menubox'.
Program terminated with signal 11, Segmentation fault.

Once, make-kpkg (a Perl script) segfaulted; another time, 'sh'
segfaulted.

As a final control, I took out the 128MB module and replaced it with
the original 64MB module. The whole kernel compile worked fine and
the machine is working perfectly, as it did before the upgrade.

Could this issue be due to the kernel? Perhaps the VM system is not
adjusting everything it needs to for the new memory? Any tests I can
run to narrow it down?

Perhaps useful specs on the machine:

Model: 2640-70U
CPU: Pentium-MMX 233MHz
L1 : 16KB
L2 : 256KB (although memtest86+ said unknown)
Chipset: 430TX (I think)
Kernel: 2.4.24

-Sanjoy


2004-03-23 05:52:48

by Willy Tarreau

[permalink] [raw]
Subject: Re: Thinkpad 560X w/ 160MB memory (2.4.24 kernel): many segfaults

Hi,

On Mon, Mar 22, 2004 at 10:34:24PM +0000, Sanjoy Mahajan wrote:
> To check that it wasn't the memory module itself, I ran the BIOS
> memory test (which passed) and also memtest86+ (no errors on 4 passes,
> which was more than 2 hours of testing).

I've had similar problems as you describe on completely different machines
due to a RAM compatibility problem. It was OK for memtest86, but burnBX
(from cpuburn) could detect the problem within 8 seconds. It seems to me
that this RAM had problems with its I/O in general, possibly with back-to-back
timings, etc... memtest86 is very good at detecting defective memory cells,
but not as good at detecting I/O problems it seems.

Cheers,
Willy

2004-03-23 22:42:45

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: Thinkpad 560X w/ 160MB memory (2.4.24 kernel): many segfaults

> burnBX (from cpuburn) could detect the problem within 8 seconds.
> [or burnMMX]

Thanks for these suggestions. I ran each for several minutes and got
no errors. So I'm still puzzled, but maybe it is a subtle memory
incompatability that neither program detects (yet somehow Linux works
the machine so hard and uncovers it?).

-Sanjoy

2004-03-24 00:31:54

by Willy Tarreau

[permalink] [raw]
Subject: Re: Thinkpad 560X w/ 160MB memory (2.4.24 kernel): many segfaults

On Tue, Mar 23, 2004 at 10:42:40PM +0000, Sanjoy Mahajan wrote:
> > burnBX (from cpuburn) could detect the problem within 8 seconds.
> > [or burnMMX]
>
> Thanks for these suggestions. I ran each for several minutes and got
> no errors. So I'm still puzzled, but maybe it is a subtle memory
> incompatability that neither program detects (yet somehow Linux works
> the machine so hard and uncovers it?).

Sorry, but IIRC both burnBX and burnMMX don't test a large portion of RAM,
but only a small amount (4 MB ?) by default. So it's fairly possible that
without options, it runs on your on-board RAM only. I believe you had to
specify it a letter as a first and only parameter. I used 'P' which meant
64 MB. I don't remember if you have higher sizes, but at least you can
start several of them in parallel to lock more memory.

Willy