Hi, i was just thinking about if it would be possible to switch kernels
without haveing to restart the entire system. Sort of a "Live kernel
replacement". It sort of goes along with the hot-swap-everything ideas. I
was thinking something like
- Take all the structs related to userspace memory and processes
- Save them to a reserved area of memory
- Halt the kernel, mostly
- Wipe kernel-space memory clean to avoid confusion
- Load new kernel into memory
- Replace all saved structures
- Start kernel running agin
This seems like the easiest way to do it. The biggest problem is that there
would be somewhere about 30 seconds where all processes would be frozen.
This could cause problems with tcp/ip connections timeing out say on a
webserver, but it would be more managable than a few minutes downtime to
restart the machine. There is one other way i can think of, something like
- Copy entire kernel memory to another reserved area of memory
- Start new kernel running as a "secondary kernel"
- Transfer control from "Primary kernel" to "Secondary Kernel"
- Load new kernel where the kernel was previously located
- Start new kernel running as a "Secondary Kernel" agin
- Transfer control between kernels
- Kill and remove temporary kernel
This system could result in nearly zero downtime, but would require more
memory, be more complicated, and would require significant modifications to
allow for a "Secondary Kernel" to be runing. Anyways, I think this could be
a nice feature of the kernel in situations where zero downtime is required.
Yes, it might be a case of "creeping featurism", but if you think so, then
tell me. If you would be interested in helping with it, send me a message,
if there is any support for it. Please CC: me any messages, it would be
quite helpful since i do not recieve the mailing list due to the excessive
volume. If you don't I will pick it up in the archives, but not as soon.
Thanks.
Colin
On Tue, Jul 10, 2001 at 02:42:12PM -0400, C. Slater wrote:
Hi, i was just thinking about if it would be possible to switch
kernels without haveing to restart the entire system.
Pre-solaris 8 sun were promising this
Sort of a "Live kernel replacement". It sort of goes along with
the hot-swap-everything ideas. I was thinking something like
- Take all the structs related to userspace memory and processes
- Save them to a reserved area of memory
- Halt the kernel, mostly
what about timing critical things? you mention networking, but there
are others
- Wipe kernel-space memory clean to avoid confusion
- Load new kernel into memory
- Replace all saved structures
what if the layout of these changes as it often does?
- Start kernel running agin
This seems like the easiest way to do it. The biggest problem is
that there would be somewhere about 30 seconds where all processes
would be frozen.
It seems like difficult to implement solution for little gain. Linux
can be booted _very_ quickly on modern machines, probably about 15s
for most hardware. If you use burn linux into the rom of use a
flashdisk (or similar solution), you can have everything rebooted in
under five seconds.
The zflinux chips/machines probably boot in half that, maybe less (as
tested on a prototype many months ago).
--cw
C. Slater wrote:
> Hi, i was just thinking about if it would be possible to switch kernels
> without haveing to restart the entire system. Sort of a "Live kernel
> replacement". It sort of goes along with the hot-swap-everything ideas. I
I actually suggested the exact same thing back in 1998 ( Link to post in
archives: http://uwsg.iu.edu/hypermail/linux/kernel/9808.1/1282.html ),
but I never recieved much response. As I remember it, the emails I
recieved where along the line of; "too much effort for too little gain,
use clustering instead". I would still be very interrested in such a
feature, but like back in 1998 this is still *way* out of my league to
try to implement (but I'd be happy to help in testing :).
Best regards,
Jesper Juhl
[email protected]
>
> > - Replace all saved structures
>
> > what if the layout of these changes as it often does?
>
> You would want to convert all structures into a neutral encoding scheme
> that would support transferring structures across versions. BER comes to
> mind, as it provides for an easy way to ignore stuff you don't understand
> and support multiple versions of the same object in a single encoding.
>
> However, this would be a truly massive task. And the big challenge would
be
> what to do when an older kernel doesn't understand something essential. It
> could be simplified significantly by supporting live replacement only of
> kernels of the same version, but this seems to defeat much of the purpose.
>
> DS
I don't think that it would be possible to switch kernels when one was not
properly set up to do it, if thats what you mean. You could only switch
between kernels that have been compiled to support live switching.
I do see you'r point with the datastructures changeing. We would need to use
some format that all properly setup kernels could understand, then we would
only need to write enough to convert the structs to the middle format and
back when they change. I am not familer with BER, but if it is suitable, it
may help.
Are you saying that swaping the kernels out altogether would be a massive
task, or that saveing/restoring the datastructures would be a massive task.
Colin
C. Slater wrote:
>
>>> - Replace all saved structures
>>>
>>>what if the layout of these changes as it often does?
>>>
>>You would want to convert all structures into a neutral encoding scheme
>>that would support transferring structures across versions. BER comes to
>>mind, as it provides for an easy way to ignore stuff you don't understand
>>and support multiple versions of the same object in a single encoding.
>>
>>However, this would be a truly massive task. And the big challenge would
>>
>be
>
>>what to do when an older kernel doesn't understand something essential. It
>>could be simplified significantly by supporting live replacement only of
>>kernels of the same version, but this seems to defeat much of the purpose.
>>
>>DS
>>
>
>I don't think that it would be possible to switch kernels when one was not
>properly set up to do it, if thats what you mean. You could only switch
>between kernels that have been compiled to support live switching.
>
>I do see you'r point with the datastructures changeing. We would need to use
>some format that all properly setup kernels could understand, then we would
>only need to write enough to convert the structs to the middle format and
>back when they change. I am not familer with BER, but if it is suitable, it
>may help.
>
>Are you saying that swaping the kernels out altogether would be a massive
>task, or that saveing/restoring the datastructures would be a massive task.
>
> Colin
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
I remembered that this thread was longly discussed 1 or 2 years ago on
linux-future
and came to no conclusive end.
--
HomePage: http://www.enlightened-popo.net
-- This was sent by Djinn running Linux 2.4.5 --
"C. Slater" wrote:
> I don't think that it would be possible to switch kernels when one was not
> properly set up to do it, if thats what you mean. You could only switch
> between kernels that have been compiled to support live switching.
>
Sure.
> I do see you'r point with the datastructures changeing. We would need to use
> some format that all properly setup kernels could understand,
That seems completely out of question. The structures a 2.4.7
kernel understands might be insufficient to express the setup
a future 2.6.9 kernel is using to do its stuff better. (And vice
versa, if future kernels drop a 2.4.7 feature deemed obsolete.
But what if that feature is in use when you decide to upgrade?)
You can easily deal with simple stuff like struct
rearrangement and type conversions, but what to do when whole data
structures
change completely?
Example: something changes from two linked lists representation to a
single tree or 4 hashtables. You'll have a very hard time inventing
a generic data format to deal with that kind of changes. It might
happen. Look at differences in 2.2 and 2.4 VM with the big pagecache
change in early 2.3. And the dentry cache that suddenly appeared.
And of course the rules change too, from time to time.
Many releases have a list of "active pages". what kind exactly is that?
The rules may change, what to do if the new kernel don't allow
one particular kind of page on that list, but the old running kernel
have a bunch?
This was jsut some made-up examples, I guess you'll run into a ton
of such issues. New releases aren't simply fixes and tweaks, there
are frequent design changes.
> Are you saying that swaping the kernels out altogether would be a massive
> task, or that saveing/restoring the datastructures would be a massive task.
All you need to swap kernel images is memory. Swapping structures
can't be done in a generic way, you'll need code that convert the
structures of one particular kernel release to those of a
particular other kernel. And I don't think you'll have the usual
kernel developers do that.
A "long-term uptime" distro might do this kind of work for a few
selected kernels, but I cannot imagine it happen for the regular
ones.
Helge Hafting
C. Slater ([email protected]) wrote :
> Hi, i was just thinking about if it would be possible to switch kernels
> without haveing to restart the entire system. Sort of a "Live kernel
> replacement". It sort of goes along with the hot-swap-everything ideas. I
> was thinking something like
> - Take all the structs related to userspace memory and processes
> - Save them to a reserved area of memory
> - Halt the kernel, mostly
> - Wipe kernel-space memory clean to avoid confusion
> - Load new kernel into memory
> - Replace all saved structures
> - Start kernel running agin
>
> This seems like the easiest way to do it. The biggest problem is that there
> would be somewhere about 30 seconds where all processes would be frozen.
This is not a problem at all, because UNIX does not guarantee that
a process will get at least one CPU slice every X seconds.
( read : UNIX is not a real time system )
soft-suspend "freezes" processes for several hours anyway ...
Note that there is a patch for hot replacing a kernel, which is equivalent
to rebooting, but much faster :
Two Kernel Monte (Linux loading Linux on x86)
http://www.scyld.com/products/beowulf/software/monte.html
> This could cause problems with tcp/ip connections timeing out say on a
> webserver, but it would be more managable than a few minutes downtime to
> restart the machine.
[ rest snipped ]
--
David Balazic
--------------
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -
>
> This is not a problem at all, because UNIX does not guarantee that
> a process will get at least one CPU slice every X seconds.
> ( read : UNIX is not a real time system )
>
> soft-suspend "freezes" processes for several hours anyway ...
>
> Note that there is a patch for hot replacing a kernel, which is equivalent
> to rebooting, but much faster :
> Two Kernel Monte (Linux loading Linux on x86)
> http://www.scyld.com/products/beowulf/software/monte.html
>
So if the Two Kernel Monte patch was combined with the
system suspend/resume in swap patch then you add some
transitions so that the code path does this:
1- Suspend->Monte
2- Monte->Load new Kernel
3- Load->Resume.
If it was just for very similar kernels, i.e. most
-pre and -ac kernels it would probably work fine.
If not, then you could just do the Monte route.
Laramie
> This is not a problem at all, because UNIX does not guarantee that
> a process will get at least one CPU slice every X seconds.
> ( read : UNIX is not a real time system )
It is not a problem when a system is isolated from all other systems, but if
we do this while some program is in a tcp/ip session, like a webserver, the
program will not beable to respond to an outside computer for the time while
we are swaping and initilizing kernels. The tcp connection will time out on
the side of the other computer then. But this is still quite managable
compared to a minute or 2 for a system to totaly reboot itself.
> soft-suspend "freezes" processes for several hours anyway ...
Yes, so it will not be a problem at least with processes dieing because they
did not get message X at time Y.
Unless we find some other way to do it, i think we will have to limit this
to only switching between kernels with the same minor version. We probably
would not beable to swap between 2.4 and 2.6 anyways, though it depends on
what changes are made.
Colin
----- Original Message -----
From: "Helge Hafting" <[email protected]>
To: "C. Slater" <[email protected]>
Cc: <[email protected]>
Sent: Wednesday, July 11, 2001 5:10 AM
Subject: Re: Switching Kernels without Rebooting?
> "C. Slater" wrote:
>
> > I don't think that it would be possible to switch kernels when one was
not
> > properly set up to do it, if thats what you mean. You could only switch
> > between kernels that have been compiled to support live switching.
> >
> Sure.
> > I do see you'r point with the datastructures changeing. We would need to
use
> > some format that all properly setup kernels could understand,
>
> That seems completely out of question. The structures a 2.4.7
> kernel understands might be insufficient to express the setup
> a future 2.6.9 kernel is using to do its stuff better. (And vice
> versa, if future kernels drop a 2.4.7 feature deemed obsolete.
> But what if that feature is in use when you decide to upgrade?)
> You can easily deal with simple stuff like struct
> rearrangement and type conversions, but what to do when whole data
> structures
> change completely?
>
> Example: something changes from two linked lists representation to a
> single tree or 4 hashtables. You'll have a very hard time inventing
> a generic data format to deal with that kind of changes. It might
> happen. Look at differences in 2.2 and 2.4 VM with the big pagecache
> change in early 2.3. And the dentry cache that suddenly appeared.
>
> And of course the rules change too, from time to time.
> Many releases have a list of "active pages". what kind exactly is that?
> The rules may change, what to do if the new kernel don't allow
> one particular kind of page on that list, but the old running kernel
> have a bunch?
>
> This was jsut some made-up examples, I guess you'll run into a ton
> of such issues. New releases aren't simply fixes and tweaks, there
> are frequent design changes.
>
> > Are you saying that swaping the kernels out altogether would be a
massive
> > task, or that saveing/restoring the datastructures would be a massive
task.
>
> All you need to swap kernel images is memory. Swapping structures
> can't be done in a generic way, you'll need code that convert the
> structures of one particular kernel release to those of a
> particular other kernel. And I don't think you'll have the usual
> kernel developers do that.
>
> A "long-term uptime" distro might do this kind of work for a few
> selected kernels, but I cannot imagine it happen for the regular
> ones.
>
> Helge Hafting
Followup to: <[email protected]>
By author: "Laramie Leavitt" <[email protected]>
In newsgroup: linux.dev.kernel
>
> So if the Two Kernel Monte patch was combined with the
> system suspend/resume in swap patch then you add some
> transitions so that the code path does this:
>
> 1- Suspend->Monte
> 2- Monte->Load new Kernel
> 3- Load->Resume.
>
> If it was just for very similar kernels, i.e. most
> -pre and -ac kernels it would probably work fine.
> If not, then you could just do the Monte route.
>
The problem is that "freezing" the kernel state and then
reconstructing it into a form USABLE BY ANOTHER KERNEL (not even
necessarily another kernel version) is unbelievably hard; furthermore,
it imposes a severe constrains about the kind of changes you're
allowed to make during your kernel development.
It's a bad idea, folks. Give it up.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
On Wed, 11 Jul 2001, Helge Hafting wrote:
> That seems completely out of question. The structures a 2.4.7
> kernel understands might be insufficient to express the setup
> a future 2.6.9 kernel is using to do its stuff better.
however, it might be handy if say you needed to upgrade a stable
kernel due to a bug fix or security update.
no?
regards,
--
Paul Jakma [email protected] [email protected]
PGP5 key: http://www.clubi.ie/jakma/publickey.txt
-------------------------------------------
Fortune:
I found Rome a city of bricks and left it a city of marble.
-- Augustus Caesar
On Wed, 11 Jul 2001, Paul Jakma wrote:
> On Wed, 11 Jul 2001, Helge Hafting wrote:
>
> > That seems completely out of question. The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
>
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
One thing which always surprises me in this discussion
(it comes up about once a year, it seems) is that
nobody participating in this discussion ever starts
writing any code for it.
Is this a feature which is only wanted by people who
don't want to code, or is this just a signal that the
amount of trouble involved just isn't worth it?
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to [email protected] (spam digging piggy)
> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.
> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?
> Rik
> --
Doesn't it make sense to decide on a feature set and method of
implementation _before_ you begin coding? Or does it make sense to just
start coding something that might never work or do what anybody wants?
When you decide to implement something, do you usually code before you
decide exactly what it is you're trying to implement and whether anybody
wants it? I certainly don't.
This isn't a very good example because this a rather bad idea overall. But
if you think it's stupid and will never work, just say that. Kill with legal
blows, especially when you're right.
DS
Does it come up often? Well, I have a sourceforge project setup and am
currently only waiting on finalizing how it's going to be done. So we have
about proved the first possibility wrong, and if you ever hear anything else
about this in a while, we will have proved the second wrong too. Soo, while
we are at it, ill say, that if anyone wants to help with it, email me. We
especialy need people that either have ideas on how to do this or have a
good knowledge of the kernel, mainly memory, processes, and initilization.
Colin
----- Original Message -----
From: Rik van Riel <[email protected]>
To: Paul Jakma <[email protected]>
Cc: Helge Hafting <[email protected]>; C. Slater <[email protected]>;
<[email protected]>
Sent: Wednesday, July 11, 2001 06:14 PM
Subject: Re: Switching Kernels without Rebooting?
> On Wed, 11 Jul 2001, Paul Jakma wrote:
> > On Wed, 11 Jul 2001, Helge Hafting wrote:
> >
> > > That seems completely out of question. The structures a 2.4.7
> > > kernel understands might be insufficient to express the setup
> > > a future 2.6.9 kernel is using to do its stuff better.
> >
> > however, it might be handy if say you needed to upgrade a stable
> > kernel due to a bug fix or security update.
>
> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.
>
> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?
>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/ http://distro.conectiva.com/
>
> Send all your spam to [email protected] (spam digging piggy)
In the future when Linux is more heavily used at the enterprise level
there will likely be upgrade/revert modules to allow such a transition to
take place.
-Kip
On Wed, 11 Jul 2001, Paul Jakma wrote:
> On Wed, 11 Jul 2001, Helge Hafting wrote:
>
> > That seems completely out of question. The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
>
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
>
> no?
>
> regards,
> --
> Paul Jakma [email protected] [email protected]
> PGP5 key: http://www.clubi.ie/jakma/publickey.txt
> -------------------------------------------
> Fortune:
> I found Rome a city of bricks and left it a city of marble.
> -- Augustus Caesar
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Wed, 11 Jul 2001, Kip Macy wrote:
> In the future when Linux is more heavily used at the enterprise level
> there will likely be upgrade/revert modules to allow such a transition
> to take place.
Only if somebody takes the trouble to write them, which
isn't something I see happening in the near future.
Not only would this feature be a LOT of work, it would
(probably) also be very invasive all over the kernel.
OTOH, if the kernel was compiled with -g maybe it'd have
enough info to locate its data structures ?
regards,
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to [email protected] (spam digging piggy)
Followup to: <[email protected]>
By author: Paul Jakma <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Wed, 11 Jul 2001, Helge Hafting wrote:
>
> > That seems completely out of question. The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
>
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
>
> no?
>
No. You have no guarantee that the state or state mangler won't
propagate the bug into the new kernel, even if it has been fixed.
Since many, if not most, bug fixes or security upgrades are related to
state getting mucked up, this is a very serious thing.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
Colin Slater writes:
> Does it come up often? Well, I have a sourceforge project setup and am
> currently only waiting on finalizing how it's going to be done. So we have
> about proved the first possibility wrong, and if you ever hear anything else
> about this in a while, we will have proved the second wrong too. Soo, while
> we are at it, ill say, that if anyone wants to help with it, email me. We
> especialy need people that either have ideas on how to do this or have a
> good knowledge of the kernel, mainly memory, processes, and initilization.
Not to be overly negative, I don't intend this email as an insult, but rather
as a "shed a little light" on the discussion email. I would be _happy_ if
you actually succeed in your project, but your comments come out as follows:
a) we want this "sounds real good" feature
b) we don't know how we will do it, beyond some hand waving ideas
c) we want kernel experts who know what they are doing to help us
d) kernel experts who have replied so far (negatively) don't know what
they are talking about, so please butt out
e) you have "started coding" by setting up a sourceforge project
Note that you are talking about a VERY difficult problem, which is
not available on 99.9% of systems out there. Maybe on a few highly
specialized *nixes which were designed for this (Sequent or such),
and probably have extra hardware support to help along. I'm _pretty_
sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
they would want to if it were easy? It would be easier than under
Linux from the perspective that their kernels change far less often,
and have relatively static interfaces.
The best proposal I've heard so far was to use MOSIX to do live job
migration between machines, and then upgrade the kernel like normal.
In the end, it is the jobs that are running on the kernel, and not
the kernel or the individual machine that are the most important. One
person pointed out that there is a single point of failure in the
MOSIX "stub" machine, which doesn't help you in the end (how do you
update the kernel there?). If you can figure a way to enhance MOSIX
to allow migrating the MOSIX "stub" processes to another machine, you
will have solved your problem in a much easier way, IMHO.
Note also that you need to look at the _specific_ reason why you want to
do live kernel upgrades, besides it "sounds real good". If you have such
tight uptime deadlines that you can't take 5 minutes of downtime to boot
a new kernel, then you are probably using a load balancing cluster anyways
in case of hardware failure, so live kernel updates are not needed here.
Note that all real-world high-availability systems I ever worked on
still allowed for SCHEDULED maintenance downtime, but highly frowned
upon UNSCHEDULED downtime. Even IBM's S/390 99.999% uptime numbers
exclude downtime for SCHEDULED outages, which are simply a fact of life.
Please prove everyone wrong by developing a way to do this, or even
showing a proof-of-concept (i.e. a user-space framework for translating
every kernel data structures from one kernel version to another, that
works across, say, a large fraction of the 2.2 kernel, or maybe from
2.4.0-test until 2.4.current). It doesn't have to be in-kernel (yet).
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
In article <[email protected]>,
Andreas Dilger <[email protected]> wrote:
>The best proposal I've heard so far was to use MOSIX to do live job
>migration between machines, and then upgrade the kernel like normal.
>In the end, it is the jobs that are running on the kernel, and not
>the kernel or the individual machine that are the most important. One
>person pointed out that there is a single point of failure in the
>MOSIX "stub" machine, which doesn't help you in the end (how do you
>update the kernel there?). If you can figure a way to enhance MOSIX
>to allow migrating the MOSIX "stub" processes to another machine, you
>will have solved your problem in a much easier way, IMHO.
If you then think of using VMWare or S/390 style methods of running multiple
copies of Linux on a single system you can now consider migrating processes
to a new kernel on the same system.
--
__O
Lineo - For Embedded Linux Solutions _-\<,_
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00 (_)/ (_) 88 EC A3 EE 2D 1C 15 68
Stuart Lynne <[email protected]> http://www.fireplug.net 604-461-7532
On Wed, 11 Jul 2001, Kip Macy wrote:
>In the future when Linux is more heavily used at the enterprise level
>there will likely be upgrade/revert modules to allow such a transition to
>take place.
I use some of the largest UNIX supercomputers ever built (IBM SP, Cray T3E,
SV1, YMP, XMP, J90, SGI Origin). None of them can start of a new kernel from an
earlier version. There are too many things that will fail:
Any network activity
Active disk I/O
Locked memory
File modification
File structures
Disk structures (yes they change...)
Clock Synchronization (SMP and cluster)
Shared memory (SMP and cluster)
semaphores (SMP and cluster)
login sessions
device status
shared disks and distributed file systems (cluster)
pipes
Before you even try switching kernels, first implement a process
checkpoint/restart. The process must be resumed after a boot using the same
kernel, with all I/O resumed. Now get it accepted into the kernel.
Anything else is just another name for "reboot using new kernel".
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
Jesse Pollard wrote:
[why switching kernels is very hard, and...]
> Before you even try switching kernels, first implement a process
> checkpoint/restart. The process must be resumed after a boot
> using the same
> kernel, with all I/O resumed. Now get it accepted into the kernel.
Hear, hear! That would be a useful feature, maybe not network servers,
but for pure number crunching apps it would save people having to write
all the state saving and recovery that is needed now for long term
computations.
For bonus points, make it work for clusters to synchronously save and
restore state for the apps running on all the nodes at once...
Torrey
-> Jesse Pollard <[email protected]> wrote:
> Before you even try switching kernels, first implement a process
> checkpoint/restart. The process must be resumed after a boot using the same
> kernel, with all I/O resumed. Now get it accepted into the kernel.
>
> Anything else is just another name for "reboot using new kernel".
Exactly. You may want to take a look at http://www.checkpointing.org
I will say that you are incredibly correct. Accualy rather funny.
> Not to be overly negative, I don't intend this email as an insult, but
rather
> as a "shed a little light" on the discussion email. I would be _happy_ if
> you actually succeed in your project, but your comments come out as
follows:
> a) we want this "sounds real good" feature
But at least it sounds good.
> b) we don't know how we will do it, beyond some hand waving ideas
We don't. We would like to change that.
> c) we want kernel experts who know what they are doing to help us
Quite correct
> d) kernel experts who have replied so far (negatively) don't know what
> they are talking about, so please butt out
We would like any information that they have. I hope they do not.
> e) you have "started coding" by setting up a sourceforge project
That line is hillarious to me. And you are right! I merely intended to show
that we are trying to go somewhere beyond a mailing list thread. To avoid
anything more i will say *trying* agin.
> Note that you are talking about a VERY difficult problem, which is
> not available on 99.9% of systems out there. Maybe on a few highly
> specialized *nixes which were designed for this (Sequent or such),
> and probably have extra hardware support to help along. I'm _pretty_
> sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
> they would want to if it were easy? It would be easier than under
> Linux from the perspective that their kernels change far less often,
> and have relatively static interfaces.
>
> The best proposal I've heard so far was to use MOSIX to do live job
> migration between machines, and then upgrade the kernel like normal.
> In the end, it is the jobs that are running on the kernel, and not
> the kernel or the individual machine that are the most important. One
> person pointed out that there is a single point of failure in the
> MOSIX "stub" machine, which doesn't help you in the end (how do you
> update the kernel there?). If you can figure a way to enhance MOSIX
> to allow migrating the MOSIX "stub" processes to another machine, you
> will have solved your problem in a much easier way, IMHO.
Unfortunatly I have not heard this yet. I have not been able to look at the
list
archives to see all of what has been posted there.
> Note also that you need to look at the _specific_ reason why you want to
> do live kernel upgrades, besides it "sounds real good". If you have such
> tight uptime deadlines that you can't take 5 minutes of downtime to boot
> a new kernel, then you are probably using a load balancing cluster anyways
> in case of hardware failure, so live kernel updates are not needed here.
>
> Note that all real-world high-availability systems I ever worked on
> still allowed for SCHEDULED maintenance downtime, but highly frowned
> upon UNSCHEDULED downtime. Even IBM's S/390 99.999% uptime numbers
> exclude downtime for SCHEDULED outages, which are simply a fact of life
> Please prove everyone wrong by developing a way to do this, or even
> showing a proof-of-concept (i.e. a user-space framework for translating
> every kernel data structures from one kernel version to another, that
> works across, say, a large fraction of the 2.2 kernel, or maybe from
> 2.4.0-test until 2.4.current). It doesn't have to be in-kernel (yet).
>
> Cheers, Andreas
> --
> Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
> \ would they cancel out, leaving him still hungry?"
> http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Thanks for you'r insight. Will try.
Would anyone else like to point out some other task somewhat related
and have me do it? :-)
> > Before you even try switching kernels, first implement a process
> > checkpoint/restart. The process must be resumed after a boot
> > using the same
> > kernel, with all I/O resumed. Now get it accepted into the kernel.
>
> Hear, hear! That would be a useful feature, maybe not network servers,
> but for pure number crunching apps it would save people having to write
> all the state saving and recovery that is needed now for long term
> computations.
Get a computer with hibernation support. That's just about what it is.
>
> For bonus points, make it work for clusters to synchronously save and
> restore state for the apps running on all the nodes at once...
Bash script.
>
> Torrey
Hello all,
I believe that if such a project is to be undertaken, it first
needs to be designed, then coded. I agree that is a difficult problem...As
for its feasiblity, I'm unsure. Maybe the reason this topic comes up
here from time to time is because it hasn't been shown to be a bad
idea. It might be be, but if we don't start somewhere, then we'll never
really know, and the debate will continue. Just my .02 cents.
Regards,
-Frank
On Thu, 12 Jul 2001 00:48:15 -0400 (EDT), Frank Davis
<[email protected]> wrote:
>Hello all,
> I believe that if such a project is to be undertaken, it first
>needs to be designed, then coded. I agree that is a difficult problem...As
>for its feasiblity, I'm unsure. Maybe the reason this topic comes up
>here from time to time is because it hasn't been shown to be a bad
>idea. It might be be, but if we don't start somewhere, then we'll never
>really know, and the debate will continue. Just my .02 cents.
>Regards,
This topic comes up once a twice a year.
Usually this topic comes to a grinding halt when someone points out
that drivers can be created modular. They can be loaded and unloaded
without rebooting Linux. One project used that technique to
load/unload different schedulers. While this satisfies only part of
the need, it is usually enough to satisfy the tinker-er.
A more recent development is UML - User Mode Linux - where you can run
a nearly complete Linux image in user mode. That way you can fiddle
with file systems to your hearts content without rebooting the main
system. I suspect that will satisfy others.
john alvord
On Wed, Jul 11, 2001 at 11:12:12PM +0100, you [Paul Jakma] claimed:
> On Wed, 11 Jul 2001, Helge Hafting wrote:
>
> > That seems completely out of question. The structures a 2.4.7
> > kernel understands might be insufficient to express the setup
> > a future 2.6.9 kernel is using to do its stuff better.
>
> however, it might be handy if say you needed to upgrade a stable
> kernel due to a bug fix or security update.
>
> no?
<clueless>
In that case you might get a way with a simpler approach. Perhaps you could
just replace the changed function(s) with new ones and scan the kernel for
calls to them. Each call should then be changed to point to the new
function. This might work provided the function interfaces don't change
(which might just be true for simple maintenance bug fixes and security
fixes.) It might even be useful for kernel development.
Of course this takes complex locking and the details are propably very
thorny.
I'm not sure if this is possible, IANAKH. But AFAIK this is roughly what
MSVC6.0 edit and continue does for userspace programs.
</clueless>
-- v --
[email protected]
[email protected] (Rik van Riel) wrote on 11.07.01 in <Pine.LNX.4.33L.0107111913010.9899-100000@imladris.rielhome.conectiva>:
> One thing which always surprises me in this discussion
> (it comes up about once a year, it seems) is that
> nobody participating in this discussion ever starts
> writing any code for it.
>
> Is this a feature which is only wanted by people who
> don't want to code, or is this just a signal that the
> amount of trouble involved just isn't worth it?
Maybe it's a sign that the people who *would* be able to contribute have
all looked at the problem already (surely most people are annoyed how a
reboot interrupts everything), and have already concluded for themselves
that it's not possible with reasonable effort ... but there is a steady
influx of new people who don't understand enough of the problem and have
to ask.
What I'd *really* like (but don't see how to get there) would be a "save
system state, shutdown, change kernel and/or hardware, reboot, restore
state" system (where state is like "I'm logged in on this console, in this
current directory, and under X I have Netscape running and this page
displayed" but I don't care about the exact state of Squid or even if my
ISDN line is dialled in, because those "fix themselves").
I suspect to do this right would need a means of storing per-process state
controlled by the process (because only that process knows what needs to
be saved, and what can easily be reconstructed - for example, open file
descriptors to a place where we store cookies don't need to be saved, just
routinely reopened), and then every user-visible non-transient program
needs to implement it - and I don't see *that* happen in the next ten
years.
But it *does* have the advantage of not needing to save kernel-internal
state.
MfG Kai
Kai Henningsen wrote:
> What I'd *really* like (but don't see how to get there) would be a "save
> system state, shutdown, change kernel and/or hardware, reboot, restore
> state" system (where state is like "I'm logged in on this console, in this
> current directory, and under X I have Netscape running and this page
> displayed" but I don't care about the exact state of Squid or even if my
> ISDN line is dialled in, because those "fix themselves").
Consider os/2 then. All workplace-shell aware programs is supposed to
save
state in this way. And yes - they do start up in the same state after
reboot if you want to. Editors come up on the page you left, filesystem
folders comes up, and so on.
> and then every user-visible non-transient program
> needs to implement it - and I don't see *that* happen in the next ten
> years.
Consider a patch for konqueror or a few other webpage/fs-view programs
and you'll go a long way - all in userspace.
Helge Hafting
"C. Slater" wrote:
>
> Unless we find some other way to do it, i think we will have to limit this
> to only switching between kernels with the same minor version. We probably
> would not beable to swap between 2.4 and 2.6 anyways, though it depends on
> what changes are made.
Minor versions won't help you. Different minor versions try to stay
interface-compatible with each other. But data structures not
exposed to interfaces can still be rewritten completely.
Lots of nice ideas and implementations have piled up for 2.5. Those
who proves immensely successfull in 2.5 may get backported to 2.4
once they get enough testing. Try reading a few months worth of
kernel patches and you'll see that things change in stable kernels
too.
Helge Hafting
On Wed, 11 Jul 2001, C. Slater wrote:
>Would anyone else like to point out some other task somewhat related
>and have me do it? :-)
>
>> > Before you even try switching kernels, first implement a process
>> > checkpoint/restart. The process must be resumed after a boot
>> > using the same
>> > kernel, with all I/O resumed. Now get it accepted into the kernel.
>>
>> Hear, hear! That would be a useful feature, maybe not network servers,
>> but for pure number crunching apps it would save people having to write
>> all the state saving and recovery that is needed now for long term
>> computations.
>
>Get a computer with hibernation support. That's just about what it is.
Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
stops processes, saves the entire state of the process. hibernation
is just halt the processor.
>>
>> For bonus points, make it work for clusters to synchronously save and
>> restore state for the apps running on all the nodes at once...
>
>Bash script.
doesn't work - remember once the kernel is suspended it can't tell
another system that is has done so.
A full checkpoint/restart can potentially allow a process to migrate
from one node to another. It also allows other processing to be done
while the process is checkpointed:
a. how do you reconstruct a software raid 5 while the system
is "suspended"
b. how do you migrate to a different platform if the system is
suspended
Answer - you can't.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
On Wed, Jul 11, 2001 at 05:44:45PM -0600, Andreas Dilger wrote:
> The best proposal I've heard so far was to use MOSIX to do live job
> migration between machines, and then upgrade the kernel like normal.
> In the end, it is the jobs that are running on the kernel, and not
> the kernel or the individual machine that are the most important. One
> person pointed out that there is a single point of failure in the
> MOSIX "stub" machine, which doesn't help you in the end (how do you
> update the kernel there?). If you can figure a way to enhance MOSIX
> to allow migrating the MOSIX "stub" processes to another machine, you
> will have solved your problem in a much easier way, IMHO.
Virtual machines a la VM are also nice for this. Build a HA cluster from
two VMs, then upgrade one after another. All that's required is HA stuff
as it already is available.
Ralf
>
> On Wed, 11 Jul 2001, C. Slater wrote:
> >Would anyone else like to point out some other task somewhat related
> >and have me do it? :-)
> >
> >> > Before you even try switching kernels, first implement a process
> >> > checkpoint/restart. The process must be resumed after a boot
> >> > using the same
> >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> >>
> >> Hear, hear! That would be a useful feature, maybe not network servers,
> >> but for pure number crunching apps it would save people having to write
> >> all the state saving and recovery that is needed now for long term
> >> computations.
> >
> >Get a computer with hibernation support. That's just about what it is.
>
> Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> stops processes, saves the entire state of the process. hibernation
> is just halt the processor.
Hibernation may not be.
I've just suspended to disk after the list line, pulled the power supplies,
taken the RAM chip out, shorted the pins to make really sure, then powered
back up.
Everything just resumed fine.
All I'd need to do kernel migration is a quick vi of the
disk file.
(well, almost)
Ralf Baechle <[email protected]>:
> On Wed, Jul 11, 2001 at 05:44:45PM -0600, Andreas Dilger wrote:
>
> > The best proposal I've heard so far was to use MOSIX to do live job
> > migration between machines, and then upgrade the kernel like normal.
> > In the end, it is the jobs that are running on the kernel, and not
> > the kernel or the individual machine that are the most important. One
> > person pointed out that there is a single point of failure in the
> > MOSIX "stub" machine, which doesn't help you in the end (how do you
> > update the kernel there?). If you can figure a way to enhance MOSIX
> > to allow migrating the MOSIX "stub" processes to another machine, you
> > will have solved your problem in a much easier way, IMHO.
>
> Virtual machines a la VM are also nice for this. Build a HA cluster from
> two VMs, then upgrade one after another. All that's required is HA stuff
> as it already is available.
That isn't even the same problem.
First, processes do not survive the upgrade.
Second, the upgrade must still be compatable with the host OS.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
--------- Received message begins Here ---------
>
> >
> > On Wed, 11 Jul 2001, C. Slater wrote:
> > >Would anyone else like to point out some other task somewhat related
> > >and have me do it? :-)
> > >
> > >> > Before you even try switching kernels, first implement a process
> > >> > checkpoint/restart. The process must be resumed after a boot
> > >> > using the same
> > >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> > >>
> > >> Hear, hear! That would be a useful feature, maybe not network servers,
> > >> but for pure number crunching apps it would save people having to write
> > >> all the state saving and recovery that is needed now for long term
> > >> computations.
> > >
> > >Get a computer with hibernation support. That's just about what it is.
> >
> > Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> > stops processes, saves the entire state of the process. hibernation
> > is just halt the processor.
>
> Hibernation may not be.
> I've just suspended to disk after the list line, pulled the power supplies,
> taken the RAM chip out, shorted the pins to make really sure, then powered
> back up.
> Everything just resumed fine.
>
> All I'd need to do kernel migration is a quick vi of the
> disk file.
>
> (well, almost)
That sounds more like a memory dump to disk, and reload after power restored.
Either that or possibly a separate power supply for RAM (something like a
trickle discharge capacitor; I've read that some capacitors can hold a charge
for about 3 days. Whether that would work for a large RAM or not, I have no
idea).
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
On Thu, Jul 12, 2001 at 07:54:10AM -0500, Jesse Pollard wrote:
> > > On Wed, 11 Jul 2001, C. Slater wrote:
> > > >Would anyone else like to point out some other task somewhat related
> > > >and have me do it? :-)
> > > >
> > > >> > Before you even try switching kernels, first implement a process
> > > >> > checkpoint/restart. The process must be resumed after a boot
> > > >> > using the same
> > > >> > kernel, with all I/O resumed. Now get it accepted into the kernel.
> > > >>
> > > >> Hear, hear! That would be a useful feature, maybe not network servers,
> > > >> but for pure number crunching apps it would save people having to write
> > > >> all the state saving and recovery that is needed now for long term
> > > >> computations.
> > > >
> > > >Get a computer with hibernation support. That's just about what it is.
> > >
> > > Bzzzt wrong anser. Hibernation stops the entire kernel. checkpoint restart
> > > stops processes, saves the entire state of the process. hibernation
> > > is just halt the processor.
> >
> > Hibernation may not be.
> > I've just suspended to disk after the list line, pulled the power supplies,
> > taken the RAM chip out, shorted the pins to make really sure, then powered
> > back up.
> > Everything just resumed fine.
> >
> > All I'd need to do kernel migration is a quick vi of the
> > disk file.
> >
> > (well, almost)
> That sounds more like a memory dump to disk, and reload after power restored.
> Either that or possibly a separate power supply for RAM (something like a
> trickle discharge capacitor; I've read that some capacitors can hold a charge
> for about 3 days. Whether that would work for a large RAM or not, I have no
> idea).
It's a suspend to disk. Lots of Laptops can do it and my Toshiba
Tecra 8100 can do it from the BIOS if I have a magic Windows partition with
an appropriate suspend file in it (which would be unencrypted, which would
be unacceptable - so I had to look for a Linux solution for the suspend
to disk problem).
Check out the swsusp project up at Source Forge
<http://sourceforge.net/projects/swsusp/>. It allows me to suspend
into the swap space by hitting Alt-SysRQ-D. Great for changing
batteries on laptops (and, no, normal suspend does not survive a battery
change) but also REALLY GREAT for forensic security analysis of compromised
systems. I hit the console of a compromised system and hit Alt-SysRq-D
and it flushs the dirty buffers, dumps memory to swap (preserving all
my "volatiles") and the shuts down. I can snapshot the hard drive and
then restart the system where it left off for live running analysis. If
that gets screwed up, I can restore the image again and restart again from
the same spot again. I've also got all the memory and CPU state in that
disk image for "in-vitro" analysis by tools like Weitse's "The Coroner's
Toolkit".
But that doesn't solve ANY of the problems with changing the kernel
itself. Suspending and restoring the system is the easy part (and swsusp
still has some problems restoring X Windows). Restoring a system to
a different kernel is orders of magnitude worse, if not down right
impossible for all the reasons given over internal structures and
interfaces.
I would LOVE to have something like swsusp in the main line kernel,
however, just so I didn't have to convince IT departments to apply this
custom kernel patch to their production systems BEFORE they get their butts
kicked by some snott nosed script kiddie. :-/
> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: [email protected]
>
> Any opinions expressed are solely my own.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michael H. Warfield | (770) 985-6132 | [email protected]
(The Mad Wizard) | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0xDF1DD471 | possible worlds. A pessimist is sure of it!
On Thu, Jul 12, 2001 at 07:23:06AM -0500, Jesse Pollard wrote:
> That isn't even the same problem.
Sure - the original problem is hard to solve so I suggest to cheat a bit :)
> First, processes do not survive the upgrade.
You care about services to continue or only want an entry for an uptime
contest?
Ralf
--------- Received message begins Here ---------
>
> On Thu, Jul 12, 2001 at 07:23:06AM -0500, Jesse Pollard wrote:
>
> > That isn't even the same problem.
>
> Sure - the original problem is hard to solve so I suggest to cheat a bit :)
>
> > First, processes do not survive the upgrade.
>
> You care about services to continue or only want an entry for an uptime
> contest?
Yes to the first, no to the second.
Processes need to continue if it takes days to arrive at a solution. If
the system DOES need to go down, then the process needs to be checkpointed.
After the outage, the process is resumed.
This is NOT easy. The last system that did it reliably (in the systems
I work with) is UNICOS 7. It did not try to save processes that had open
network connections (even NFS) or pipes. Between UNICOS 7-10, it was
attempted to include pipes and sockets, provided both ends of the communication
were controlled by the same host (socket to a local daemon, both processes
in the pipe within the same batch job). This didn't work (well, partly worked:
pipes seem to work, but sockets didn't). During this time more and more
processes failed on restart, unless they were contrained to only single
process events. Cluster systems - no chance. It seems impossible to force
a synchronous checkpoint across a cluster (well - theoretically possible).
The problem was that it may take 10-20 minutes to checkpoint a single process.
During that time the corresponding process on another node approaches the
checkpoint location, and fails due to a network timeout. Distributed batch
job dies.
I've seen some processes (single process now) take over a half an hour
to checkpoint (120 MWword (64bit words) = 960 MB being written to disk.
First it has to stop the process syncronously with all file activity (might
take 5 minutes for all buffers to complete). Then the kernel saves the active
process memory (the 960MB - 5-10 minutes), then all outstanding I/O buffers and status
structures (scatter/gather, reformat, write - might take another 5 minutes)
During the entire time, the system would be doing other I/O for other processes
not being checkpointed (daemons, interactive logins, etc). When the process
reached 4-8GB in size, stopping a batch stream could take over an hour.
During the outage, drivers could be updated, scheduling parameters altered,
hardware fixes like raid disk replacements or cpu, just low level activity.
Anything that affected the file structure (ie changing dates, relocated
files, renamed files...) would cause the checkpointed process to fail to
restart.
The restart procedure had to allocate memory for I/O buffers (cache buffers),
reload them, reload the process private structures, verify that files remained
consistant with parameters in the private structures, reset file pointer
locations for any open files, reload pipe buffers. Then repeat for the
process at the other end of the pipe. After all pipes and processes are
reloaded (without any consistency errors) all processes involved would
be entered in the run queue
The architecture of the Cray YMP systems simplified a LOT of the activity.
1. The hardware did NOT support paging..
2. All data structures were contigeous in memory (excluding only the cache
buffers for pipes, and disk.
3. All data structures contained only offset location (relative to the
physical address of the process private data structure). The process
memory ALWAYS followed the process private data structure.
4. Buffer cache pointers were independant of the user process, only the
queue identifiers were needed in the process private space, not pointers
to the queue.
Note: a process that was swapped out was really swapped out (all memory). It
looked like (from the documentation) it was a slightly simplified form of
a checkpoint file.
None of this applies to other Cray hardware (T3, SV1). The SV1 is most
similar to the YMP line, but because of the more "cluster" operations
I'm less familar with how the checkpoint/restart works across the SV1.
The uptime contest is still lost because the system DID go down.
Process checkpoint/restart has been advertised for SGI IRIX systems,
but I've not seen it (first release didn't work if files were open).
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
On Wed, 11 Jul 2001, C. Slater wrote:
> Does it come up often? Well, I have a sourceforge project setup and am
> currently only waiting on finalizing how it's going to be done.
I hope you have fun waiting.
If you're really serious about this feature, however,
you may want to start looking into the technical
details behind your wish to get an idea of exactly
how much work it would be to implement this feature.
regards,
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to [email protected] (spam digging piggy)
On Wed, 11 Jul 2001, C. Slater wrote:
> > a) we want this "sounds real good" feature
> But at least it sounds good.
And nothing wrong with that. It seems an excelent
opportunity to learn lots about every part of the
kernel.
> > b) we don't know how we will do it, beyond some hand waving ideas
> We don't. We would like to change that.
>
> > c) we want kernel experts who know what they are doing to help us
> Quite correct
I guess there are two things to do here:
(1) analyse the general idea of what you want to achieve,
breaking it down in sub-goals which may be achievable
(2) learn about how the kernel works, you may want to go to
http://kernelnewbies.org/
I won't have time to put in a project as huge and difficult
as upgrading the kernel "live", but I'll be around to try
and teach people about how the kernel works.
regards,
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to [email protected] (spam digging piggy)
Rik van Riel writes:
> I won't have time to put in a project as huge and difficult
> as upgrading the kernel "live", but I'll be around to try
> and teach people about how the kernel works.
I think I see a business opportunity here.
Live upgrades require data structure conversion and other horrors.
You can't just write the code and expect it to maintain itself.
You'd need to rewrite half of it every time, for every patch level.
The 24x7 places might be willing to pay somebody to do this.
It's consulting work really. The customer says "I want to go
from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
make a custom upgrade procedure for them.
How often would a company that demands 24x7 uptime /want/ to upgrade their
kernel? It seems to me that when the choice been decided to take that
kind of a step in a production environment, that someone has done lots of
tests with the new target kernel, so that even if they don't have the
extra hardware to bring up another server in parallel, the most downtime
that would be suffered would be the time it takes to do two boots (boot
the new kernel, find out it doesn't work, reboot the old one.)
Not to discourage anyone, but is this really necessary, or is it something
to be worked on just to say that it can be done?
Just a random comment from someone who knows very little.
Regards,
Mike
On Thu Jul 12 12:23:31 2001 Albert D. Cahalan said...
> Rik van Riel writes:
>
> > I won't have time to put in a project as huge and difficult
> > as upgrading the kernel "live", but I'll be around to try
> > and teach people about how the kernel works.
>
> I think I see a business opportunity here.
>
> Live upgrades require data structure conversion and other horrors.
> You can't just write the code and expect it to maintain itself.
> You'd need to rewrite half of it every time, for every patch level.
>
> The 24x7 places might be willing to pay somebody to do this.
> It's consulting work really. The customer says "I want to go
> from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
> make a custom upgrade procedure for them.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
-> [email protected] (Kai Henningsen) wrote:
> [email protected] (Rik van Riel) wrote on 11.07.01 in <Pine.LNX.4.33L.0107111913010.9899-100000@imladris.rielhome.conectiva>:
> I suspect to do this right would need a means of storing per-process state
> controlled by the process (because only that process knows what needs to
> be saved, and what can easily be reconstructed - for example, open file
> descriptors to a place where we store cookies don't need to be saved, just
> routinely reopened), and then every user-visible non-transient program
> needs to implement it - and I don't see *that* happen in the next ten
> years.
This would be the easiest way to do in the sense that application authors take care of their own stuff, and kernel developpers only need to define rules/interfaces.
One scheme is that we can define a new signal number (e.g., SIGCKPT). When we send the signal to the process, it checkpoints itself (saves everything it needs for a restart). Then we define another signal (e.e., SIGRSUM). When we send the signal to it, it then knows that it should resume from the last checkpointed point. This is user-level checkpoint/restart, and there are already certain packages available (Condor, libckpt, etc).
If we want total transparency (i.e., applications don't need to be aware and everything is taken care of by the kernel), then the kernel needs substantial changes (I've written a kernel module to do this).
On Thu, 12 Jul 2001, Albert D. Cahalan wrote:
> I think I see a business opportunity here.
[snip technically risky idea]
> The 24x7 places might be willing to pay somebody to do this.
Unlikely. They need hardware redundancy anyway, so they'll
just upgrade their cluster node-by-node, without doing
risky and potentially data-corrupting things like live
kernel upgrades.
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/ http://distro.conectiva.com/
Send all your spam to [email protected] (spam digging piggy)
"Albert D. Cahalan" wrote:
> The 24x7 places might be willing to pay somebody to do this.
> It's consulting work really. The customer says "I want to go
> from 2.4.8 to 2.4.12", you say "OK, $320405 please.", and you
> make a custom upgrade procedure for them.
Speaking as someone who is working on what will eventually be a five 9's project
based on linux, there is almost zero chance that we would make use of something
like this. Applications and kernels are tested together and verified together,
and the likelihood of changing either one and not the other one is very low (and
in fact they are shipped together as a single image).
We have hardware redundancy, and upgrades are controlled by the application,
since it knows exactly what state must be transferred and what the differences
are between versions. After all the state has been transferred we then do an IP
takeover so that the rest of the system knows to talk to the new side. At this
point we can test the new side for a while. If we're satisfied with how its
performing, we can then take down the inactive side and upgrade it and then
bring it back into sync with the active side. If we don't like it, we can
always abort and switch back to the old version.
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
Torrey Hoffman wrote:
>
> Jesse Pollard wrote:
>
> [why switching kernels is very hard, and...]
>
> > Before you even try switching kernels, first implement a process
> > checkpoint/restart. The process must be resumed after a boot
> > using the same
> > kernel, with all I/O resumed. Now get it accepted into the kernel.
>
> Hear, hear! That would be a useful feature, maybe not network servers,
> but for pure number crunching apps it would save people having to write
> all the state saving and recovery that is needed now for long term
> computations.
There is a checkpointing and resumeing lib at
ftp://gutemine.geo.uni-koeln.de/pub/chkpt/
I am not sure if it has been ported to linux yet, but it might be worth
a look.
>
> For bonus points, make it work for clusters to synchronously save and
> restore state for the apps running on all the nodes at once...
>
> Torrey
bye,
Wilfried
> I've just suspended to disk after the list line, pulled the power
supplies,
> taken the RAM chip out, shorted the pins to make really sure, then
powered
> back up.
FYI: Taking the memory module out and shorting its pins together is a
great way to unnecessarily risk zapping your RAM with ESD, and a
terrible way to ensure that its contents are erased. When the DRAM is
not being accessed (by definition true when you remove power), the gate
capacitors that form the DRAM array are floating unconnected and cannot
be intentionally discharged. You just have to wait for good old leakage
to kill the bits. A minute should be more than enough.
>
> > I've just suspended to disk after the list line, pulled the power
> supplies,
> > taken the RAM chip out, shorted the pins to make really sure, then
> powered
> > back up.
>
> FYI: Taking the memory module out and shorting its pins together is a
> great way to unnecessarily risk zapping your RAM with ESD, and a
> terrible way to ensure that its contents are erased. When the DRAM is
> not being accessed (by definition true when you remove power), the gate
> capacitors that form the DRAM array are floating unconnected and cannot
> be intentionally discharged. You just have to wait for good old leakage
> to kill the bits. A minute should be more than enough.
I know, I observed antistatic precautions, and did wait a couple of minutes
(while making a coffe).
[email protected] (Helge Hafting) wrote on 12.07.01 in <[email protected]>:
> Kai Henningsen wrote:
>
> > What I'd *really* like (but don't see how to get there) would be a "save
> > system state, shutdown, change kernel and/or hardware, reboot, restore
> > state" system (where state is like "I'm logged in on this console, in this
> > current directory, and under X I have Netscape running and this page
> > displayed" but I don't care about the exact state of Squid or even if my
> > ISDN line is dialled in, because those "fix themselves").
>
> Consider os/2 then. All workplace-shell aware programs is supposed to
> save
> state in this way.
The keyword is "supposed". Because I remember from my OS/2 days that most
didn't.
OTOH, Borland's DOS IDE does. It's a mixed bag.
> And yes - they do start up in the same state after
> reboot if you want to. Editors come up on the page you left, filesystem
> folders comes up, and so on.
Most programs from IBM got it right, most others didn't, as far as I can
recall.
> > and then every user-visible non-transient program
> > needs to implement it - and I don't see *that* happen in the next ten
> > years.
>
> Consider a patch for konqueror or a few other webpage/fs-view programs
> and you'll go a long way - all in userspace.
Well, Netscape *can* sort of do it (for one window).
But how do I make it happen for bash? login? xdm? Amd so on ... anyway, I
simply don't have the time for such a project. I'm spread too thin as it
is.
MfG Kai
On 12-Jul-01 John Alvord wrote:
> On Thu, 12 Jul 2001 00:48:15 -0400 (EDT), Frank Davis
> <[email protected]> wrote:
>
>>Hello all,
>> I believe that if such a project is to be undertaken, it first
>>needs to be designed, then coded. I agree that is a difficult problem...As
>>for its feasiblity, I'm unsure. Maybe the reason this topic comes up
>>here from time to time is because it hasn't been shown to be a bad
>>idea. It might be be, but if we don't start somewhere, then we'll never
>>really know, and the debate will continue. Just my .02 cents.
>>Regards,
>
> This topic comes up once a twice a year.
>
> Usually this topic comes to a grinding halt when someone points out
> that drivers can be created modular. They can be loaded and unloaded
> without rebooting Linux. One project used that technique to
> load/unload different schedulers. While this satisfies only part of
> the need, it is usually enough to satisfy the tinker-er.
One problem with this is many of the modules may be difficult to replace
because they are in use.
If someone did want to spend time on a project like this, one place they could
start would be to try to make some of the modules hot replaceable.
As an example that pops to mind would be a scsi driver:
1. Tell the kernel to stop sending it commands.
2. wait for things in progress to complete.
3. save whatever state you need to.
4. remove old.
5. start up new.
6. start restoring state.
6. reset scsi bus.
7. reprobe for devices?
8. finish restore state.
9. tell the kernel we are available.
This example was chosen not because I think the scsi drivers are buggy. :)
It was chosen type of module that someone might want to replace, but couldn't
because it was in use (a file system mounted on it).
Maybe a network card would be easier to start with, with similar requirements.
Then you could hope all the patches will be for modules. :)
I also haven't looked at the code to see if it was possible. :)
On Thu, 12 Jul 2001, Rik van Riel wrote:
> On Thu, 12 Jul 2001, Albert D. Cahalan wrote:
>
> > I think I see a business opportunity here.
>
> [snip technically risky idea]
>
> > The 24x7 places might be willing to pay somebody to do this.
>
> Unlikely. They need hardware redundancy anyway, so they'll
> just upgrade their cluster node-by-node, without doing
> risky and potentially data-corrupting things like live
> kernel upgrades.
I see business in a different way: instead of ISP or ASP you provide a
backup cluster node where you can migrate your processes before rebooting.
Everything keeps on working, no magic involved.
So we can invent the CNP (Cluster Node Provider)
Pau
Hi!
> Would anyone else like to point out some other task somewhat related
> and have me do it? :-)
Ummm, I need someone to cook me lunch tommorow ;-).
> > > Before you even try switching kernels, first implement a process
> > > checkpoint/restart. The process must be resumed after a boot
> > > using the same
> > > kernel, with all I/O resumed. Now get it accepted into the kernel.
> >
> > Hear, hear! That would be a useful feature, maybe not network servers,
> > but for pure number crunching apps it would save people having to write
> > all the state saving and recovery that is needed now for long term
> > computations.
>
> Get a computer with hibernation support. That's just about what it
> is.
No. Hibernation can be done (see sw_susp patches). This is per-process
-> different. And you could implement that "live upgrade" similar
way. Checkpoint all. Reboot with new kernel. Restart all. That's close
enough to live upgrade.
(Ouch, what are you going to do with programs that behave differently
on different kernel releases? What if you have X using some kernel
driver that goes away in new release?)
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]
Hi!
> > That sounds more like a memory dump to disk, and reload after power restored.
> > Either that or possibly a separate power supply for RAM (something like a
> > trickle discharge capacitor; I've read that some capacitors can hold a charge
> > for about 3 days. Whether that would work for a large RAM or not, I have no
> > idea).
>
> It's a suspend to disk. Lots of Laptops can do it and my Toshiba
> Tecra 8100 can do it from the BIOS if I have a magic Windows partition with
> an appropriate suspend file in it (which would be unencrypted, which would
> be unacceptable - so I had to look for a Linux solution for the suspend
> to disk problem).
>
> Check out the swsusp project up at Source Forge
> <http://sourceforge.net/projects/swsusp/>. It allows me to suspend
> into the swap space by hitting Alt-SysRQ-D. Great for changing
> batteries on laptops (and, no, normal suspend does not survive a battery
> change) but also REALLY GREAT for forensic security analysis of compromised
> systems. I hit the console of a compromised system and hit Alt-SysRq-D
> and it flushs the dirty buffers, dumps memory to swap (preserving all
> my "volatiles") and the shuts down. I can snapshot the hard drive and
> then restart the system where it left off for live running analysis. If
> that gets screwed up, I can restore the image again and restart again from
> the same spot again. I've also got all the memory and CPU state in that
> disk image for "in-vitro" analysis by tools like Weitse's "The Coroner's
> Toolkit".
>
> But that doesn't solve ANY of the problems with changing the kernel
> itself. Suspending and restoring the system is the easy part (and swsusp
> still has some problems restoring X Windows). Restoring a system to
> a different kernel is orders of magnitude worse, if not down right
> impossible for all the reasons given over internal structures and
> interfaces.
>
> I would LOVE to have something like swsusp in the main line kernel,
> however, just so I didn't have to convince IT departments to apply this
> custom kernel patch to their production systems BEFORE they get their butts
> kicked by some snott nosed script kiddie. :-/
Patience. swsusp is needed for ACPI S4 support. And I guess ACPI S4 is
good enough reason to push it to Linus.
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]
Hi!
> What I'd *really* like (but don't see how to get there) would be a "save
> system state, shutdown, change kernel and/or hardware, reboot, restore
> state" system (where state is like "I'm logged in on this console, in this
> current directory, and under X I have Netscape running and this page
> displayed" but I don't care about the exact state of Squid or even if my
> ISDN line is dialled in, because those "fix themselves").
Suspend-to-disk, change hardware, restore-from-disk, load neccessary
modules seems quite easy to do with swsusp. It is very different from
suspend-to-disk, change kernel, restore-from-disk (which is guaranteed
to kill you if kernel changes size).
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]
> Suspend-to-disk, change hardware, restore-from-disk, load neccessary
> modules seems quite easy to do with swsusp. It is very different from
> suspend-to-disk, change kernel, restore-from-disk (which is guaranteed
> to kill you if kernel changes size).
It works for most hw changes. I've used swsusp to replace a burned out 3c509
without rebooting 8)