2006-02-25 13:21:51

by Victor Porton,,,

[permalink] [raw]
Subject: New reliability technique

A minute ago I invented a new reliability enhancing technique.

In idle cycles (or periodically in expense of some performance) Linux can
calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
sums with previously calculated sums.

Additionally it can be done for allocated memory, if it will be write
protected before the first actual write. Moreover, all memory may be made
write-protected if it is not written e.g. more than a second. (When it
is written kernel would unlock it and allow to write, by a techniqie like
to how swap works.) If write-protected memory appears to be modified by
a check sum, this likewise indicates a bug.

If a sum is inequal, it would notice a bug in kernel or in hardware.

I suggest to add "Check free memory control sums" in config.

--
Victor Porton ([email protected]) - http://porton.ex-code.com


2006-02-25 13:26:28

by Jesper Juhl

[permalink] [raw]
Subject: Re: New reliability technique

On 2/25/06, Victor Porton <[email protected]> wrote:
> A minute ago I invented a new reliability enhancing technique.
>
> In idle cycles (or periodically in expense of some performance) Linux can
> calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
> sums with previously calculated sums.
>
> Additionally it can be done for allocated memory, if it will be write
> protected before the first actual write. Moreover, all memory may be made
> write-protected if it is not written e.g. more than a second. (When it
> is written kernel would unlock it and allow to write, by a techniqie like
> to how swap works.) If write-protected memory appears to be modified by
> a check sum, this likewise indicates a bug.
>
> If a sum is inequal, it would notice a bug in kernel or in hardware.
>
> I suggest to add "Check free memory control sums" in config.
>

Implement it then and send a patch.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-25 13:27:25

by Jesper Juhl

[permalink] [raw]
Subject: Re: New reliability technique

On 2/25/06, Jesper Juhl <[email protected]> wrote:
> On 2/25/06, Victor Porton <[email protected]> wrote:
> > A minute ago I invented a new reliability enhancing technique.
> >
> > In idle cycles (or periodically in expense of some performance) Linux can
> > calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
> > sums with previously calculated sums.
> >
> > Additionally it can be done for allocated memory, if it will be write
> > protected before the first actual write. Moreover, all memory may be made
> > write-protected if it is not written e.g. more than a second. (When it
> > is written kernel would unlock it and allow to write, by a techniqie like
> > to how swap works.) If write-protected memory appears to be modified by
> > a check sum, this likewise indicates a bug.
> >
> > If a sum is inequal, it would notice a bug in kernel or in hardware.
> >
> > I suggest to add "Check free memory control sums" in config.
> >
>
> Implement it then and send a patch.
>

But, doesn't slab poisoning and the like already cover this ground somewhat?


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-25 19:52:12

by Avi Kivity

[permalink] [raw]
Subject: Re: New reliability technique

Jesper Juhl wrote:

>On 2/25/06, Jesper Juhl <[email protected]> wrote:
>
>
>>On 2/25/06, Victor Porton <[email protected]> wrote:
>>
>>
>>>A minute ago I invented a new reliability enhancing technique.
>>>
>>>In idle cycles (or periodically in expense of some performance) Linux can
>>>calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
>>>sums with previously calculated sums.
>>>
>>>Additionally it can be done for allocated memory, if it will be write
>>>protected before the first actual write. Moreover, all memory may be made
>>>write-protected if it is not written e.g. more than a second. (When it
>>>is written kernel would unlock it and allow to write, by a techniqie like
>>>to how swap works.) If write-protected memory appears to be modified by
>>>a check sum, this likewise indicates a bug.
>>>
>>>If a sum is inequal, it would notice a bug in kernel or in hardware.
>>>
>>>I suggest to add "Check free memory control sums" in config.
>>>
>>>
>>>
>>Implement it then and send a patch.
>>
>>
>>
>
>But, doesn't slab poisoning and the like already cover this ground somewhat?
>
>
>
No, they don't. They cover only a very small percentage of memory.

On the other hand, ECC memory and caches do this in hardware.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2006-02-25 19:56:29

by Jesper Juhl

[permalink] [raw]
Subject: Re: New reliability technique

On 2/25/06, Avi Kivity <[email protected]> wrote:
> Jesper Juhl wrote:
>
> >On 2/25/06, Jesper Juhl <[email protected]> wrote:
> >
> >
> >>On 2/25/06, Victor Porton <[email protected]> wrote:
> >>
> >>
> >>>A minute ago I invented a new reliability enhancing technique.
> >>>
> >>>In idle cycles (or periodically in expense of some performance) Linux can
> >>>calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
> >>>sums with previously calculated sums.
> >>>
> >>>Additionally it can be done for allocated memory, if it will be write
> >>>protected before the first actual write. Moreover, all memory may be made
> >>>write-protected if it is not written e.g. more than a second. (When it
> >>>is written kernel would unlock it and allow to write, by a techniqie like
> >>>to how swap works.) If write-protected memory appears to be modified by
> >>>a check sum, this likewise indicates a bug.
> >>>
> >>>If a sum is inequal, it would notice a bug in kernel or in hardware.
> >>>
> >>>I suggest to add "Check free memory control sums" in config.
> >>>
> >>>
> >>>
> >>Implement it then and send a patch.
> >>
> >>
> >>
> >
> >But, doesn't slab poisoning and the like already cover this ground somewhat?
> >
> >
> >
> No, they don't. They cover only a very small percentage of memory.
>

Ohh, ok, then it makes sense as a debug thing.

Let's see an implementation then.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 08:55:22

by Janos Farkas

[permalink] [raw]
Subject: Re: New reliability technique

> > >>On 2/25/06, Victor Porton <[email protected]> wrote:
> > >>>A minute ago I invented a new reliability enhancing technique.

> > >>>In idle cycles (or periodically in expense of some performance) Linux can
> > >>>calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
> > >>>sums with previously calculated sums.

On 2006-02-25 at 20:56:27, Jesper Juhl wrote:
> Ohh, ok, then it makes sense as a debug thing.
>
> Let's see an implementation then.

http://www.ussg.iu.edu/hypermail/linux/kernel/9701.1/0058.html

At least a variation of it, maybe not from a minute ago :)

2006-02-26 11:35:13

by Victor Porton,,,

[permalink] [raw]
Subject: Re: New reliability technique


On 25-Feb-2006 Avi Kivity wrote:
> Jesper Juhl wrote:
...
>>>On 2/25/06, Victor Porton <[email protected]> wrote:
>>>
>>>
>>>>In idle cycles (or periodically in expense of some performance) Linux can
>>>>calculate MD5 or CRC sums of _unused_ (free) memory areas and compare these
>>>>sums with previously calculated sums.
>>>>
>>>>Additionally it can be done for allocated memory, if it will be write
>>>>protected before the first actual write. Moreover, all memory may be made
>>>>write-protected if it is not written e.g. more than a second. (When it
>>>>is written kernel would unlock it and allow to write, by a techniqie like
>>>>to how swap works.) If write-protected memory appears to be modified by
>>>>a check sum, this likewise indicates a bug.
>>>>
>>>>If a sum is inequal, it would notice a bug in kernel or in hardware.
>>>>
>>>>I suggest to add "Check free memory control sums" in config.

> No, they don't. They cover only a very small percentage of memory.
>
> On the other hand, ECC memory and caches do this in hardware.

Isn't it better to double check (especially after such risky things as
e.g. software suspend)?

We need to check not only for damaged hardware, but also for
kernel/modules bugs. For this ECC and cache reliability is useless.

--
Victor Porton ([email protected]) - http://porton.ex-code.com

2006-02-26 12:25:31

by Pekka Enberg

[permalink] [raw]
Subject: Re: New reliability technique

On 2/26/06, Victor Porton <[email protected]> wrote:
> Isn't it better to double check (especially after such risky things as
> e.g. software suspend)?
>
> We need to check not only for damaged hardware, but also for
> kernel/modules bugs. For this ECC and cache reliability is useless.

What kernel bugs do you want to catch with double-checking free
memory? For use-after-free, we already have slab poisoning.

Pekka

2006-02-26 12:30:40

by Nick Piggin

[permalink] [raw]
Subject: Re: New reliability technique

Pekka Enberg wrote:
> On 2/26/06, Victor Porton <[email protected]> wrote:
>
>>Isn't it better to double check (especially after such risky things as
>>e.g. software suspend)?
>>
>>We need to check not only for damaged hardware, but also for
>>kernel/modules bugs. For this ECC and cache reliability is useless.
>
>
> What kernel bugs do you want to catch with double-checking free
> memory? For use-after-free, we already have slab poisoning.
>

And for !slab, we unmap kernel virtual addresses with page debugging,
which seems like a better solution.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com