LinuxLists.cc - Inquiry Regarding Handling of Kernel Crashes

2024-05-12 17:10:48

Subject: Inquiry Regarding Handling of Kernel Crashes

Dear Linux Kernel Community,

I hope this email finds you well. I am currently engaged in testing
device drivers in Linux kernel mode, and I have encountered various
types of kernel crashes during my testing process.

Among these, some examples of kernel crashes include OOPS, lockups and others.

I have a few questions regarding the handling of kernel crashes during testing:

When encountering a kernel crash during testing, is it advisable to
continue testing without rebooting the system? Or is it preferable to
reboot the system after each kernel crash and then resume testing?

Can the first kernel crash, whether it is an OOPS, or any other type
crash, potentially lead to subsequent crashes of the same or different
types? If so, should debugging efforts focus only on the first kernel
crash, or should all subsequent crashes also be considered and
addressed?

In the event that the system needs to be rebooted after a kernel
crash, how can user space test utilities be informed that a kernel
crash has occurred? Additionally, how can the system be configured to
automatically reboot in the event of a kernel crash?

I would greatly appreciate any insights or best practices you can
share regarding the handling of kernel crashes during testing. Your
expertise and guidance on this matter would be invaluable to my
testing efforts.

Thank you very much for your time and assistance. I look forward to
your response.

--
Thanks,
Sekhar

2024-05-12 20:50:44

by Dr. David Alan Gilbert

[permalink] [raw]

Subject: Re: Inquiry Regarding Handling of Kernel Crashes

* Muni Sekhar ([email protected]) wrote:
> Dear Linux Kernel Community,

Hi,

> I hope this email finds you well. I am currently engaged in testing
> device drivers in Linux kernel mode, and I have encountered various
> types of kernel crashes during my testing process.
>
> Among these, some examples of kernel crashes include OOPS, lockups and others.
>
> I have a few questions regarding the handling of kernel crashes during testing:
>
> When encountering a kernel crash during testing, is it advisable to
> continue testing without rebooting the system? Or is it preferable to
> reboot the system after each kernel crash and then resume testing?

Rebooting is best.

> Can the first kernel crash, whether it is an OOPS, or any other type
> crash, potentially lead to subsequent crashes of the same or different
> types? If so, should debugging efforts focus only on the first kernel
> crash, or should all subsequent crashes also be considered and
> addressed?

Yes - not all failures do that, but some will cause follow on crashes;
looking at the first crash normally gives the most reliable idea
of what went wrong. But keep all the logs, anything might help you figure
it out.

> In the event that the system needs to be rebooted after a kernel
> crash, how can user space test utilities be informed that a kernel
> crash has occurred? Additionally, how can the system be configured to
> automatically reboot in the event of a kernel crash?

See Documentation/admin-guide/kernel-parameters.txt there are
quite a few useful ones, in particular:
oops=panic will cause a panic after an oops
which when you combine it with
panic=30

means an oops will then cause a panic which causes a reboot.

You could also consider using a 'crash kernel' - on a panic
that lands in a fresh kernel that just saves a memory snapshot
that you can then try and debug.

Turning on a watchdog as well is good; some kernel bugs just hang
rather than giving a nice oops.

> I would greatly appreciate any insights or best practices you can
> share regarding the handling of kernel crashes during testing. Your
> expertise and guidance on this matter would be invaluable to my
> testing efforts.
>
> Thank you very much for your time and assistance. I look forward to
> your response.

Good luck!

Dave

>
>
> --
> Thanks,
> Sekhar
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/