Hello lkml,
I have two machines AlphaServers ES40
machine 1 with 4 Gbytes of RAM
machine 2 with 8 Gbytes of RAM
These two machines work perfectly with Tru64, The RAM is ok
on both of these machines.
1) I recompiled kernel 2.2.19pre17aa1
==> The two machines boot well, but are limited to 2 Gbytes of RAM.
2) So I applyed the patch 2.2.19pre17aa1.b, configured with BIGMEM enabled,
recompiled and reboot.
The two machines boot, see respectively 4 Gbytes and 8 Gbytes of RAM. a
little C program allocating writting and reading to the memory shows it is
really working.
THE PROBLEM:
-------------------------
On the machine with 4 Gbytes the system freezes when doing /bin/lspci
or more /proc/pci (2 equivalent system calls fopen(/proc/pci) ) and not
on the machine with 8 GBytes!
Remark: When I say freezes, I don't see the kernel panic message on the
console and when I was running in SMP multiusers I kept receiving spinlock
messages every 2 minutes:
opensched.c:30 spinlock stuck in httpd at fffffc000032d524(3) owner
lspci at fffffc000035155c(1) read_write.c:43
sched.c:30 spinlock stuck in identd at fffffc000032d524(2) owner
lspci at fffffc000035155c(1) read_write.c:43
sched.c:30 spinlock stuck in identd at fffffc000032d524(0) owner
lspci at fffffc000035155c(1) read_write.c:43
For the following I prefered to go back to Uni Processor kernel in single
user
mode because the system is more simple and the problems are the same:
3) If I try to reboot the 4 Gbytes machine giving mem=2048 M at boot time,
everything works fine again (except the 2 GBytes of RAM unavailable)
4) If I try to reboot the 8 Gbytes machine giving mem=4096 M at boot time,
I have the same behaviour, freezing when doing more /proc/pci , as the 4
Gbytes machine
5) If I try to reboot the 8 Gbytes machine giving mem=2048 M or 6144 M,
everything works perfectly.
SMP/UP:
--------------
I HAVE EXACTLY THE SAME RESULTS WITH SMP/UP kernel so I guess
investigating on Uni processor is simpler.
I have tried to go through the code source of the patch but I am new to
kernel
programming, it looks like there is a switching mecanism around the 4 Gbytes
value for RAM size... that could be it.... I am still investigating.
Any Ideas ?
----------------------------------------------------------------------------
--
Sebastien CABANIOLS
COMPAQ France
HPTC Engineer
CustomSystems & Solutions Annecy
High Performance Technical Computing
Office No. +33 (0)4 50 09 44 10
Fax No. +33 (0)4 50 64 01 39
Email. [email protected]
----------------------------------------------------------------------------
--
It seems that having a Myrinet 2k board plugged into any slot of the second
PCI bus
of the ES40 make the system freeze with 4 GBytes of RAM when doing cat
/proc/pci.
I just plugged the board back into one slot of PCI 0 and it works again.
Why does it work with Tru64 and not with Linux ? I don't know.
Thanks to Andrea Arcangeli for helping me understand there was something
with the second PCI bus.
----------------------------------------------------------------------------
--
Sebastien CABANIOLS
COMPAQ France
HPTC Engineer
CustomSystems & Solutions Annecy
High Performance Technical Computing
Office No. +33 (0)4 50 09 44 10
Fax No. +33 (0)4 50 64 01 39
Email. [email protected]
----------------------------------------------------------------------------
--