Hi, all.
My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
>INIT: Switching to runlevel: 6
>INIT: Send processes the TERM signal
>Unable to handle kernel NULL pointer dereference
What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
Thank you in advance 8-)
Zhigang Huo
[email protected]
>
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>
> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
> Thank you in advance 8-)
If you boot the machije without your driver, then reboot does the
same happen ? If not then it may well be your driver has an error but only
when it closes down
On Wed, 24 Apr 2002, Huo Zhigang wrote:
> Hi, all.
> My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
>
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>
> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
> Thank you in advance 8-)
>
> Zhigang Huo
> [email protected]
Have you tried decoding the oops? Have a look at
linux/Documentation/oops-tracing.txt
Zwane
--
http://function.linuxpower.ca
Thank you.
There is no oops information available. The mechine is just freezed there deadly. In a machine, when this happens, the beep in its box will beep endlessly.
>
>> Hi, all.
>> My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)
>>
>> Zhigang Huo
>> [email protected]
>
>Have you tried decoding the oops? Have a look at
>linux/Documentation/oops-tracing.txt
>
> Zwane
>
>--
I boot all the nodes in my cluster without my driver and it is "insmod"ed manually.
Now, I will try to "reboot" my machine after the driver is removed. Great. Thank you.
>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)
>Alan:
>If you boot the machije without your driver, then reboot does the
>same happen ? If not then it may well be your driver has an error but only
>when it closes down
Remove the driver first befor reboot! It works. But what is relation between the reboot process and my driver? When I remove the driver module myself, nothing goes wrong. What is the difference?
Thank you.
Joining in the lkml is so exciting. :)
I love this community.
>>
>> >INIT: Switching to runlevel: 6
>> >INIT: Send processes the TERM signal
>> >Unable to handle kernel NULL pointer dereference
>>
>> What's wrong with my machines? They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>> Thank you in advance 8-)
>
>If you boot the machije without your driver, then reboot does the
>same happen ? If not then it may well be your driver has an error but only
>when it closes down
On Apr 24, 2002 17:44 +0800, Huo Zhigang wrote:
> Remove the driver first befor reboot! It works. But what is relation
> between the reboot process and my driver? When I remove the driver module
> myself, nothing goes wrong. What is the difference?
Does your module have a timer or thread which may be active at shutdown?
It may be that if it has a kernel thread that the TERM will kill the
thread and your driver does not expect this.
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
No thread and timer associated with it.
I am a kernel newbie. I just managed to make the module's release function to cope with "ctrl+c", nothing more is done. I do not know how to impove my code to make it cope with the TERM signal(No. 15 signal ?). In closing the dev file of a device, IMHO, all signal trigger the same function.
>On Apr 24, 2002 17:44 +0800, Huo Zhigang wrote:
>> Remove the driver first befor reboot! It works. But what is relation
>> between the reboot process and my driver? When I remove the driver module
>> myself, nothing goes wrong. What is the difference?
>
>Does your module have a timer or thread which may be active at shutdown?
>It may be that if it has a kernel thread that the TERM will kill the
>thread and your driver does not expect this.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>http://sourceforge.net/projects/ext2resize/