Hi,
My hardware and softare:
Asus P4P800, 2GB memory, 2.8GHZ P4 with HT enabled.
On-board 3com Giga bit network card
1 parallel ata 160G maxtor disk
Nvidia Gefore4 MX440-8x graphics card (Asus V9180)
Redhat 7.3, original kernel 2.4.18-3
Intel Fortran Compiler 7.1
Intel Math Kernel Library 6.0
My problem is that one of my fortran code always crashes after 10-24 hours.
This code has a heavy IO. It writes out a 5MB binary file every 1 minute.
The error message in /var/log/message is: (coulson is the name of the computer)
Aug 7 21:11:29 coulson kernel: kernel BUG at page_alloc.c:226!
Aug 7 21:11:29 coulson kernel: invalid operand: 0000
Aug 7 21:11:29 coulson kernel: nfsd lockd sunrpc binfmt_misc sr_mod soundcore
parport_pc lp parport autofs 3c
....
the line number with page_alloc.c varies for different crases (not always 226).
This code also had a heavy IO. Specifically, it writes out a 5MB file every
1 minute.
I have done a couple of tests to try to find the cause, but without success so
far.
1) I have rmmod the 3com2000.0 network driver. The driver source is downloaded
from asus website, and compiled by me. The crash still occurs.
2) I heard of Nvidia binary driver can cause page_alloc.c kernel bug.
I do have a Nvidia Gefore4 MX440-8x graphics card (Asus V9180), but I did
not use Nvidia binary driver. Instead I used the vesa driver coming with redhat
7.3. Nevertheless, I changed to an old pci ATI rage graphics card. But the
crash still occurs.
3) when the crash occurs, my local X is up and running. I have not tried
the case with local X shutdown.
I also got the same crash when I run the code on a second computer with the
same hw and sw. This makes the hw defect less likely to be the cause.
I am thinking to recompile a new kernel. But now I focus on if it is caused
by some uncompatible module drivers.
I would appreciate your inputs on this. Please cc your answer to me. I am not on
the list.
Xiaogang
------------------------------------------------
Dr Xiaogang Wang
Departement de chimie
Universite de Montreal
C.P. 6128, succursale Centre-ville
Montreal (Quebec) H3C 3J7
Tel. (514) 3436111 ext 3947 (office)
FAX (514) 3437586 (office)
e-mail: [email protected]
homepage: http://www.esi.umontreal.ca/~wangx
------------------------------------------------
On Fri, 8 Aug 2003, Xiaogang Wang wrote:
> Hi,
>
> My hardware and softare:
>
> Asus P4P800, 2GB memory, 2.8GHZ P4 with HT enabled.
> On-board 3com Giga bit network card
> 1 parallel ata 160G maxtor disk
> Nvidia Gefore4 MX440-8x graphics card (Asus V9180)
>
> Redhat 7.3, original kernel 2.4.18-3
You might want to update your RedHat kernel and if the problem persists
report it on their bugzilla (bugzilla.redhat.com).
> Intel Fortran Compiler 7.1
> Intel Math Kernel Library 6.0
>
> My problem is that one of my fortran code always crashes after 10-24 hours.
> This code has a heavy IO. It writes out a 5MB binary file every 1 minute.
>
> The error message in /var/log/message is: (coulson is the name of the computer)
>
> Aug 7 21:11:29 coulson kernel: kernel BUG at page_alloc.c:226!
> Aug 7 21:11:29 coulson kernel: invalid operand: 0000
> Aug 7 21:11:29 coulson kernel: nfsd lockd sunrpc binfmt_misc sr_mod soundcore
--
function.linuxpower.ca
On Gwe, 2003-08-08 at 15:39, Xiaogang Wang wrote:
> Hi,
>
> My hardware and softare:
>
> Asus P4P800, 2GB memory, 2.8GHZ P4 with HT enabled.
> On-board 3com Giga bit network card
> 1 parallel ata 160G maxtor disk
> Nvidia Gefore4 MX440-8x graphics card (Asus V9180)
>
> Redhat 7.3, original kernel 2.4.18-3
How about updating to the errata kernel ?
On Fri, 8 Aug 2003, Zwane Mwaikambo wrote:
> On Fri, 8 Aug 2003, Xiaogang Wang wrote:
>
> > Hi,
> >
> > My hardware and softare:
> >
> > Asus P4P800, 2GB memory, 2.8GHZ P4 with HT enabled.
> > On-board 3com Giga bit network card
> > 1 parallel ata 160G maxtor disk
> > Nvidia Gefore4 MX440-8x graphics card (Asus V9180)
> >
> > Redhat 7.3, original kernel 2.4.18-3
>
> You might want to update your RedHat kernel and if the problem persists
> report it on their bugzilla (bugzilla.redhat.com).
>
I have updated the kernel from linux-2.4.18-3 to linux-2.4.20-19.7
with no network driver 3c2000.o compiled and with an old pci ATI rage graphics
card.
The kernel BUG occurs now at vmscan.c, in lieu of page_alloc.c
Aug 9 20:38:19 coulson kernel: ------------[ cut here ]------------
Aug 9 20:38:19 coulson kernel: kernel BUG at vmscan.c:747!
Aug 9 20:38:19 coulson kernel: invalid operand: 0000
Aug 9 20:38:19 coulson kernel:
Aug 9 20:38:19 coulson kernel: CPU: 0
I still have no clue.
Xiaogang
------------------------------------------------
Dr Xiaogang Wang
Departement de chimie
Universite de Montreal
C.P. 6128, succursale Centre-ville
Montreal (Quebec) H3C 3J7
Tel. (514) 3436111 ext 3947 (office)
FAX (514) 3437586 (office)
e-mail: [email protected]
homepage: http://www.esi.umontreal.ca/~wangx
------------------------------------------------