2002-04-24 07:56:08

by Huo Zhigang

[permalink] [raw]
Subject:

Hi, all.
My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.

>INIT: Switching to runlevel: 6
>INIT: Send processes the TERM signal
>Unable to handle kernel NULL pointer dereference

What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
Thank you in advance 8-)

Zhigang Huo
[email protected]


2002-04-24 08:09:04

by Alan

[permalink] [raw]
Subject: Re: your mail

>
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>
> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
> Thank you in advance 8-)

If you boot the machije without your driver, then reboot does the
same happen ? If not then it may well be your driver has an error but only
when it closes down

2002-04-24 08:28:36

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: your mail

On Wed, 24 Apr 2002, Huo Zhigang wrote:

> Hi, all.
> My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
>
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>
> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
> Thank you in advance 8-)
>
> Zhigang Huo
> [email protected]

Have you tried decoding the oops? Have a look at
linux/Documentation/oops-tracing.txt

Zwane

--
http://function.linuxpower.ca


2002-04-24 08:49:28

by Huo Zhigang

[permalink] [raw]
Subject: Re: Re: your mail

Thank you.
There is no oops information available. The mechine is just freezed there deadly. In a machine, when this happens, the beep in its box will beep endlessly.

>
>> Hi, all.
>> My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)
>>
>> Zhigang Huo
>> [email protected]
>
>Have you tried decoding the oops? Have a look at
>linux/Documentation/oops-tracing.txt
>
> Zwane
>
>--


2002-04-24 08:52:35

by Huo Zhigang

[permalink] [raw]
Subject: Re: Re: your mail


I boot all the nodes in my cluster without my driver and it is "insmod"ed manually.
Now, I will try to "reboot" my machine after the driver is removed. Great. Thank you.

>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)

>Alan:
>If you boot the machije without your driver, then reboot does the
>same happen ? If not then it may well be your driver has an error but only
>when it closes down



2002-04-24 09:45:46

by Huo Zhigang

[permalink] [raw]
Subject: Re: Re: your mail


Remove the driver first befor reboot! It works. But what is relation between the reboot process and my driver? When I remove the driver module myself, nothing goes wrong. What is the difference?

Thank you.
Joining in the lkml is so exciting. :)
I love this community.

>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)
>
>If you boot the machije without your driver, then reboot does the
>same happen ? If not then it may well be your driver has an error but only
>when it closes down


2002-04-24 10:20:56

by Andreas Dilger

[permalink] [raw]
Subject: Re: Re: your mail

On Apr 24, 2002 17:44 +0800, Huo Zhigang wrote:
> Remove the driver first befor reboot! It works. But what is relation
> between the reboot process and my driver? When I remove the driver module
> myself, nothing goes wrong. What is the difference?

Does your module have a timer or thread which may be active at shutdown?
It may be that if it has a kernel thread that the TERM will kill the
thread and your driver does not expect this.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-04-24 11:21:36

by Huo Zhigang

[permalink] [raw]
Subject: Re: Re: Re: your mail

No thread and timer associated with it.
I am a kernel newbie. I just managed to make the module's release function to cope with "ctrl+c", nothing more is done. I do not know how to impove my code to make it cope with the TERM signal(No. 15 signal ?). In closing the dev file of a device, IMHO, all signal trigger the same function.

>On Apr 24, 2002 17:44 +0800, Huo Zhigang wrote:
>> Remove the driver first befor reboot! It works. But what is relation
>> between the reboot process and my driver? When I remove the driver module
>> myself, nothing goes wrong. What is the difference?
>
>Does your module have a timer or thread which may be active at shutdown?
>It may be that if it has a kernel thread that the TERM will kill the
>thread and your driver does not expect this.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>http://sourceforge.net/projects/ext2resize/