2009-03-02 16:22:00

by Tarkan Erimer

[permalink] [raw]
Subject: Re: Failover Kernel

Lubomir Rintel wrote:
>
> How is the backup kernel minimal? It is usually the very same kernel as
> the "primary" one. You can use the same initrd as well and do a full
> multiuser boot.
>
>
Kdump's backup (in kdump terms, it is "Capture Kernel") kernel is with
minimal set of features and modules (scsi drivers, network drivers etc.)
to have small memory footprint and resources to just handle crash dump
related things. So,it's not a full replacement of primary kernel. Also,
the point is not to make boot when crash occured. The idea is to take
control when a crash occured by backup kernel without any need to reboot.


2009-03-03 03:30:20

by David Newall

[permalink] [raw]
Subject: Re: Failover Kernel

Tarkan Erimer wrote:
> the point is not to make boot when crash occured. The idea is to take
> control when a crash occured by backup kernel without any need to reboot.

It sounds like you want everything to just continue running. I don't
see how that can be done. All of those in-kernel tables and structures
would need to be migrated, and it follows, because there was a crash,
that any of them might have been corrupted. Worse, you want this to
save you when you try running a new kernel which crashes, and being a
new kernel, it follows that any of those structures could be different;
it might not be possible to create equivalent structures for different
kernel versions.

If you're at all concerned at keeping the computer running, and I think
that's your goal, then I think the best you can do is reset the
hardware, boot an alternate kernel and restart applications as appropriate.

2009-03-04 08:29:55

by Tarkan Erimer

[permalink] [raw]
Subject: Re: Failover Kernel

On 03/03/2009 05:29 AM, David Newall wrote:
> It sounds like you want everything to just continue running. I don't
>
Yes, exactly. Backup kernel will take control when a crush occured
without need a reboot or halt.
> see how that can be done. All of those in-kernel tables and structures
> would need to be migrated, and it follows, because there was a crash,
> that any of them might have been corrupted. Worse, you want this to
> save you when you try running a new kernel which crashes, and being a
> new kernel, it follows that any of those structures could be different;
> it might not be possible to create equivalent structures for different
> kernel versions.
>
>
Yes, that's right and it's the first thing needed to overcome. Maybe, it
could be implemented like this :

- Primary kernel could be 2.6.x or 2.6.x.y (2.6.28 or 2.6.28.1)
- Backup kernel could be one of these .y fix releases only: Like 2.6.28.5

So; when they're from the same version, it will prevent kernel API and
structure changes.
For resuming by backup kernel: The primary kernel could write a journal
about the needed things for backup to resume. Like process IDs, memory
and process situations etc. The same manner as the Journalled File
Systems did (they write a journal what they did to recover/resume at
crash/disaster time).

2009-03-06 01:10:35

by David Lang

[permalink] [raw]
Subject: Re: Failover Kernel

On Wed, 4 Mar 2009, Tarkan Erimer wrote:

> On 03/03/2009 05:29 AM, David Newall wrote:
>> It sounds like you want everything to just continue running. I don't
>>
> Yes, exactly. Backup kernel will take control when a crush occured without
> need a reboot or halt.
>> see how that can be done. All of those in-kernel tables and structures
>> would need to be migrated, and it follows, because there was a crash,
>> that any of them might have been corrupted. Worse, you want this to
>> save you when you try running a new kernel which crashes, and being a
>> new kernel, it follows that any of those structures could be different;
>> it might not be possible to create equivalent structures for different
>> kernel versions.
>>
>>
> Yes, that's right and it's the first thing needed to overcome. Maybe, it
> could be implemented like this :
>
> - Primary kernel could be 2.6.x or 2.6.x.y (2.6.28 or 2.6.28.1)
> - Backup kernel could be one of these .y fix releases only: Like 2.6.28.5
>
> So; when they're from the same version, it will prevent kernel API and
> structure changes.
> For resuming by backup kernel: The primary kernel could write a journal about
> the needed things for backup to resume. Like process IDs, memory and process
> situations etc. The same manner as the Journalled File Systems did (they
> write a journal what they did to recover/resume at crash/disaster time).

wrong, kernel structures can change in any patch. they can even change
with different configuration options.

but even if they are the same version and configuration options, that
doesn't address the fact that you can't trust the in-kernel structures
because they may have been damaged by whatever caused the crash.

David Lang

2009-03-09 12:35:54

by Tarkan Erimer

[permalink] [raw]
Subject: Re: Failover Kernel

On 03/06/2009 03:10 AM, [email protected] wrote:
> wrong, kernel structures can change in any patch. they can even change
> with different configuration options.
>
> but even if they are the same version and configuration options, that
> doesn't address the fact that you can't trust the in-kernel structures
> because they may have been damaged by whatever caused the crash.
>
> David Lang

Sorry for late reply. I was away for a while. Hmmm... I understood. It
seems, it's not so possible. Thanks for who replied to this thread.