LinuxLists.cc - Hot-patching

2005-09-20 22:08:04

Subject: Hot-patching

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oftentimes distributions spin on a stable kernel, but occasionally
update it for security bugs. This then demands a reboot, or you sit on
a buggy kernel for however long.

These bugfixes don't typically change the exported binary interface of
the existing functions being corrected, and so it would be feasible to
halt all processors and execute an atomic codepath to switch the symbols
in the running kernel to point to the replacement functions from the old
ones. If big functions are split up into smaller ones, as long as the
interface is the same for all existing functions, it shouldn't matter as
well.

I'm curious though, we're seeing a pattern of 2.6.x -> 2.6.x.y ->
for(;;) 2.6.x.++y; the incrimental releases are pretty much bugfixes,
although they get to be mounting eventually. I believe it would be
feasible to set up an internal kernel function to atomically replace
existing symbols with new, updated ones; but I'm not sure, as I've never
looked that deep into the kernel. I'm thinking the following process
would do it though:

struct replace_syms {
void *old_address; // Address to look up to replace,
// i.e. &sys_fork()
void *new_address; // New address to replace with, i.e.
// &new_sys_fork();
};

- Load module
- Module calls atomic_replace_symbols(struct replace_syms changes[])
- atomic_replace_symbols() freezes all other processors (SMP)
- atomic_replace_symbols() disables preempt for itself
- atomic_replace_symbols() turns off interrupt handling
- atomic_replace_symbols() finds each old_address in the symbol table
and replaces with each corresponding new_address
- atomic_replace_symbols() sees {NULL,NULL} and decides it's done
- atomic_replace_symbols() turns interrupt handling back on
- atomic_replace_symbols() turns preempt back on
- atomic_replace_symbols() un-halts all the CPUs
- Module init is done

Of course the module could never be removed. The pre-existing code
would continue to exist, and any CPU or thread executing in those
replaced functions would finish in the copy it started in; the new
copies would be used for future calls. Because it's very un-nice to try
to figure out when a copy of the function is no longer being used, it
would be infeasible to later try to remove the module.

Besides getting rid of a pet peeve of mine (more rebooting than
absolutely necessary) and giving a way to continuously increase the size
of the running kernel with each bugfix, this has implications on servers
that don't want to reboot for whatever reason. For enterprise
applications, it would be possible to fix a kernel bug or security hole
that hasn't been triggered by loading a module with the bugfixes,
effectively hot-patching the kernel.

Physically updating the kernel would be complex; the distribution would
have to install a new kernel, and install corresponding update modules
to avoid a reboot. However, consider that the routine phases of early
boot "worked up until now," and you realize that as long as the hotfix
is loaded early, there's no real risk of triggering whatever bug it
fixes if you just install the new modules and load them early during boot.

Implementing this could be done via a script which scans for changes in
two kernel trees and notes all functions that need to be individually
built into the module. These functions could be extracted from the
files and packed into a single module with wrapper code that facilitates
the hot-patching on load. Exactly how to fully automate this is,
however, difficult. Once the scripts and facilities are made, however,
it would be fully up to distribution maintainers to split out, compile,
and package hot-patches.

Security concerns with this are minimal; to exploit any added attack
vectors, an attacker needs module loading permissions. This would mean
the attacker already has the system compromised. Further, there are
"modules" that load into the kernel and supply a fork bomb defuser and
trusted path execution; these modules change things like sys_fork() and
basic filesystem symbols in a similar manner, so it's demonstratable
that the "added attack vectors" can just be embedded into the module
anyway. At best, trying to "do it yourself" may produce a method that
won't quite work all the time; while using a supplied facility will give
a guarantee that it'll work.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDMIgUhDd4aOud5P8RAm3NAJwJBN7KHYAgD8NftZEqYrv6GRpSFgCfXwPK
44m+XbymTPaycZhuHIi8LeA=
=zrVt
-----END PGP SIGNATURE-----

2005-09-20 22:18:59

by Jesper Juhl

[permalink] [raw]

Subject: Re: Hot-patching

On 9/21/05, John Richard Moser <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Oftentimes distributions spin on a stable kernel, but occasionally
> update it for security bugs. This then demands a reboot, or you sit on
> a buggy kernel for however long.
>

This has been discussed time and time again on this list and elsewhere
over the years. The most recent discussion I recall is the "[PATCH
x86_64] Live Patching Function on 2.6.11.7" thread which drew over 50
comments and got into a lot of corners - I'd suggest you go read it in
the archives.
Spend a little time searching and you'll find several other threads
about this in lkml archives.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2005-09-20 22:21:47

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: Hot-patching

On Tue, 20 Sep 2005 18:07:17 EDT, John Richard Moser said:

> These bugfixes don't typically change the exported binary interface of
> the existing functions being corrected, and so it would be feasible to
> halt all processors and execute an atomic codepath to switch the symbols
> in the running kernel to point to the replacement functions from the old
> ones. If big functions are split up into smaller ones, as long as the
> interface is the same for all existing functions, it shouldn't matter as
> well.

I believe telco switch software has been doing patch-on-the-fly for quite
a long time. It's a royal pain in the butt, especially if you have any
dynamic 'struct foo_ops' lurking.

And you can't just plop the code in either - let's say the fix includes "add
a state bit to the 'struct foo_ctl' to track XYZ". Now you need to think about
the fact that there's likely kmalloc'ed struct foo_ctl's already out there
that don't know about this bit. Hilarity ensues....

Attachments:

(No filename) (226.00 B)

2005-09-20 22:47:59

by Jesper Juhl

[permalink] [raw]

Subject: Re: Hot-patching

On 9/21/05, John Richard Moser <[email protected]> wrote:
[snip]
> Besides getting rid of a pet peeve of mine (more rebooting than
> absolutely necessary) and giving a way to continuously increase the size
> of the running kernel with each bugfix, this has implications on servers
> that don't want to reboot for whatever reason. For enterprise
> applications, it would be possible to fix a kernel bug or security hole
> that hasn't been triggered by loading a module with the bugfixes,
> effectively hot-patching the kernel.
>
[snip]

If you have uptime demands like that I think a much better approach
would be to make sure the box is heavily firewalled so importance of
the security of the host itself drops. If there's no way to get to a
box in a way that enables you to actually exploit a security hole,
then it doesn't matter much that the hole is there at all.

Another option would be a clustered setup where you normally run the
app(s) on nodeA, nodeB ... nodeN, then when you need to upgrade you
move all running applications off of nodeA and upgrade it, move
everything off of nodeB and then upgrade that, repeat for nr of nodes,
finally redistribute the load properly again.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2005-09-20 22:51:30

by John Richard Moser

[permalink] [raw]

Subject: Re: Hot-patching

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[email protected] wrote:
> On Tue, 20 Sep 2005 18:07:17 EDT, John Richard Moser said:
>
>
>>These bugfixes don't typically change the exported binary interface of
>>the existing functions being corrected, and so it would be feasible to
>>halt all processors and execute an atomic codepath to switch the symbols
>>in the running kernel to point to the replacement functions from the old
>>ones. If big functions are split up into smaller ones, as long as the
>>interface is the same for all existing functions, it shouldn't matter as
>>well.
>
>
> I believe telco switch software has been doing patch-on-the-fly for quite
> a long time. It's a royal pain in the butt, especially if you have any
> dynamic 'struct foo_ops' lurking.
>
> And you can't just plop the code in either - let's say the fix includes "add
> a state bit to the 'struct foo_ctl' to track XYZ". Now you need to think about
> the fact that there's likely kmalloc'ed struct foo_ctl's already out there
> that don't know about this bit. Hilarity ensues....

In which case you can't make a hot-patch, unless you like watching your
kernel take a crap.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDMJJDhDd4aOud5P8RAijUAJ9BPXQoVCeD5EzTXedBGaIz87SjsgCdEN21
6/2BUWcZ4HT4FwSFkWt5OEM=
=WjP8
-----END PGP SIGNATURE-----

2005-09-20 22:58:43

by John Richard Moser

[permalink] [raw]

Subject: Re: Hot-patching

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jesper Juhl wrote:
> On 9/21/05, John Richard Moser <[email protected]> wrote:
> [snip]
>
>>Besides getting rid of a pet peeve of mine (more rebooting than
>>absolutely necessary) and giving a way to continuously increase the size
>>of the running kernel with each bugfix, this has implications on servers
>>that don't want to reboot for whatever reason. For enterprise
>>applications, it would be possible to fix a kernel bug or security hole
>>that hasn't been triggered by loading a module with the bugfixes,
>>effectively hot-patching the kernel.
>>
>
> [snip]
>
> If you have uptime demands like that I think a much better approach
> would be to make sure the box is heavily firewalled so importance of
> the security of the host itself drops. If there's no way to get to a
> box in a way that enables you to actually exploit a security hole,
> then it doesn't matter much that the hole is there at all.

Yeah. Not always feasible though; let's say the bug manifests in
something Apache tells the kernel to do (there's quite a lot of
syscalls) based on stuff passed to CGI scripts. Firewalls and
everything, but slide in a "legitimate" port 80 or port 443 access and BLAM.

Shell servers like compile farms are also interesting, if you want to
talk about firewalling not being all that great. That's of course if
you care about local attacks; personally if I have 10000 employees or
clients using a machine I don't want to trust them all to be nice.

>
> Another option would be a clustered setup where you normally run the
> app(s) on nodeA, nodeB ... nodeN, then when you need to upgrade you
> move all running applications off of nodeA and upgrade it, move
> everything off of nodeB and then upgrade that, repeat for nr of nodes,
> finally redistribute the load properly again.
>
>

Beautiful setup that, and surprisingly cost effective if 1) you can do
it yourself, and 2) you're using just 2 nodes. I'd prefer 3 nodes for a
minimal set-up of course, so if I upgrade one and the other goes down I
still have a third; I'm obsessive about perfectly stable environments,
it has to be able to stand up to a bomb blast or the ending scene from
Hackers with all the blackhats in the world tearing ass at the system.

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDMJPzhDd4aOud5P8RAhMNAJ9zQXu8qBenrVOpUhobqNoaht/svACgji8P
klO1Shq2h9o/dWb4iza1adw=
=OL8+
-----END PGP SIGNATURE-----

2005-09-20 23:07:23

by Jesper Juhl

[permalink] [raw]

Subject: Re: Hot-patching

On 9/21/05, John Richard Moser <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jesper Juhl wrote:
> > On 9/21/05, John Richard Moser <[email protected]> wrote:
> > [snip]
> >
> >>Besides getting rid of a pet peeve of mine (more rebooting than
> >>absolutely necessary) and giving a way to continuously increase the size
> >>of the running kernel with each bugfix, this has implications on servers
> >>that don't want to reboot for whatever reason. For enterprise
> >>applications, it would be possible to fix a kernel bug or security hole
> >>that hasn't been triggered by loading a module with the bugfixes,
> >>effectively hot-patching the kernel.
> >>
> >
> > [snip]
> >
> > If you have uptime demands like that I think a much better approach
> > would be to make sure the box is heavily firewalled so importance of
> > the security of the host itself drops. If there's no way to get to a
> > box in a way that enables you to actually exploit a security hole,
> > then it doesn't matter much that the hole is there at all.
>
> Yeah. Not always feasible though; let's say the bug manifests in
> something Apache tells the kernel to do (there's quite a lot of
> syscalls) based on stuff passed to CGI scripts. Firewalls and
> everything, but slide in a "legitimate" port 80 or port 443 access and BLAM.
>
> Shell servers like compile farms are also interesting, if you want to
> talk about firewalling not being all that great. That's of course if
> you care about local attacks; personally if I have 10000 employees or
> clients using a machine I don't want to trust them all to be nice.
>

Firewalls are not a panacea, no. But for many (not all) issues, good
firewalling can eliminate the immediate need to patch a server.

> >
> > Another option would be a clustered setup where you normally run the
> > app(s) on nodeA, nodeB ... nodeN, then when you need to upgrade you
> > move all running applications off of nodeA and upgrade it, move
> > everything off of nodeB and then upgrade that, repeat for nr of nodes,
> > finally redistribute the load properly again.
> >
>
> Beautiful setup that, and surprisingly cost effective if 1) you can do
> it yourself, and 2) you're using just 2 nodes. I'd prefer 3 nodes for a
> minimal set-up of course, so if I upgrade one and the other goes down I
> still have a third; I'm obsessive about perfectly stable environments,
> it has to be able to stand up to a bomb blast or the ending scene from
> Hackers with all the blackhats in the world tearing ass at the system.
>

A few links you may want to take a look at :

http://www.linuxvirtualserver.org/
http://www.linux-ha.org/
http://lcic.org/ha.html
http://openmosix.sourceforge.net/

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html