2008-11-26 16:13:54

by Pavel Machek

[permalink] [raw]
Subject: Document hadling of bad memory


Document how to deal with bad memory reported with memtest.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
new file mode 100644
index 0000000..df84162
--- /dev/null
+++ b/Documentation/bad_memory.txt
@@ -0,0 +1,45 @@
+March 2008
+Jan-Simon Moeller, [email protected]
+
+
+How to deal with bad memory e.g. reported by memtest86+ ?
+#########################################################
+
+There are three possibilities I know of:
+
+1) Reinsert/swap the memory modules
+
+2) Buy new modules (best!) or try to exchange the memory
+ if you have spare-parts
+
+3) Use BadRAM or memmap
+
+This Howto is about number 3) .
+
+
+BadRAM
+######
+BadRAM is the actively developed and available as kernel-patch
+here: http://rick.vanrein.org/linux/badram/
+
+For more details see the BadRAM documentation.
+
+memmap
+######
+
+memmap is already in the kernel and usable as kernel-parameter at
+boot-time. Its syntax is slightly strange and you may need to
+calculate the values by yourself!
+
+Syntax to exclude a memory area (see kernel-parameters.txt for details):
+memmap=<size>$<address>
+
+Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
+ some others. All had 0x1869xxxx in common, so I chose a pattern of
+ 0x18690000,0xffff0000.
+
+With the numbers of the example above:
+memmap=64K$0x18690000
+ or
+memmap=0x10000$0x18690000
+

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-11-26 16:26:19

by Jan-Simon Möller

[permalink] [raw]
Subject: Re: Document hadling of bad memory

Am Mittwoch 26 November 2008 17:15:21 schrieb Pavel Machek:
>
> Document how to deal with bad memory reported with memtest.
>
> Signed-off-by: Pavel Machek <[email protected]>
Signed-off-by: Jan-Simon M?ller <[email protected]>

> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
[...]

Best regards,
Jan-Simon

2008-11-27 00:42:25

by Jiri Kosina

[permalink] [raw]
Subject: Re: Document hadling of bad memory


[ [email protected] added, these should be the proper guys to
merge this ]

On Wed, 26 Nov 2008, Pavel Machek wrote:

>
> Document how to deal with bad memory reported with memtest.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> new file mode 100644
> index 0000000..df84162
> --- /dev/null
> +++ b/Documentation/bad_memory.txt
> @@ -0,0 +1,45 @@
> +March 2008
> +Jan-Simon Moeller, [email protected]
> +
> +
> +How to deal with bad memory e.g. reported by memtest86+ ?
> +#########################################################
> +
> +There are three possibilities I know of:
> +
> +1) Reinsert/swap the memory modules
> +
> +2) Buy new modules (best!) or try to exchange the memory
> + if you have spare-parts
> +
> +3) Use BadRAM or memmap
> +
> +This Howto is about number 3) .
> +
> +
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here: http://rick.vanrein.org/linux/badram/
> +
> +For more details see the BadRAM documentation.
> +
> +memmap
> +######
> +
> +memmap is already in the kernel and usable as kernel-parameter at
> +boot-time. Its syntax is slightly strange and you may need to
> +calculate the values by yourself!
> +
> +Syntax to exclude a memory area (see kernel-parameters.txt for details):
> +memmap=<size>$<address>
> +
> +Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
> + some others. All had 0x1869xxxx in common, so I chose a pattern of
> + 0x18690000,0xffff0000.
> +
> +With the numbers of the example above:
> +memmap=64K$0x18690000
> + or
> +memmap=0x10000$0x18690000
> +
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

--
Jiri Kosina
SUSE Labs

2008-11-28 09:01:18

by Rob Landley

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> Document how to deal with bad memory reported with memtest.
...
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here: http://rick.vanrein.org/linux/badram/

So the patch isn't worth merging, but documentation about the out-of-tree
patch is worth merging?

I'm not objecting, I'm just confused about to what the merge criteria are...

Rob

2008-11-28 09:48:19

by Jan-Simon Möller

[permalink] [raw]
Subject: Re: Document hadling of bad memory

Am Freitag 28 November 2008 10:00:26 schrieb Rob Landley:
>
> So the patch isn't worth merging, but documentation about the out-of-tree
> patch is worth merging?
Good point.

IIRC we tried merging the patch, but without luck at that time. It was said, that there's another method
(with an even <irony>better</irony> syntax) which could also handle this case and there should be better
some hacking to get the syntax parsed to use the functions of this already in-kernel method.
I don't know the status of this (guess: none).
What I know: badmem worked here really good. (But meantime I bought new ram.)

Best regards,
Jan-Simon

2008-11-28 12:17:20

by Pavel Machek

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Fri 2008-11-28 03:00:26, Rob Landley wrote:
> On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> > Document how to deal with bad memory reported with memtest.
> ...
> > +BadRAM
> > +######
> > +BadRAM is the actively developed and available as kernel-patch
> > +here: http://rick.vanrein.org/linux/badram/
>
> So the patch isn't worth merging, but documentation about the out-of-tree
> patch is worth merging?

Well, why not. The patch is unneccessary, but for the poor souls hit
by bad memory, one line pointer can help...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-29 05:29:39

by Rob Landley

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Friday 28 November 2008 06:18:38 Pavel Machek wrote:
> On Fri 2008-11-28 03:00:26, Rob Landley wrote:
> > So the patch isn't worth merging, but documentation about the out-of-tree
> > patch is worth merging?
>
> Well, why not. The patch is unneccessary, but for the poor souls hit
> by bad memory, one line pointer can help...
> Pavel

Define "unnecessary".

Rob

2008-11-29 06:51:20

by Andrew Morton

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Fri, 28 Nov 2008 03:00:26 -0600 Rob Landley <[email protected]> wrote:

> On Wednesday 26 November 2008 10:15:21 Pavel Machek wrote:
> > Document how to deal with bad memory reported with memtest.
> ...
> > +BadRAM
> > +######
> > +BadRAM is the actively developed and available as kernel-patch
> > +here: http://rick.vanrein.org/linux/badram/
>
> So the patch isn't worth merging, but documentation about the out-of-tree
> patch is worth merging?
>
> I'm not objecting, I'm just confused about to what the merge criteria are...
>

mm.. If someone finds it useful (and I assume that at least one person
would have found it useful, hence the effort to write the patch) then
why not?

(And yeah, yeah, someone might find a .gif of a parrot useful too. Go
do some work.)

2008-12-01 18:57:19

by Randy Dunlap

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Wed, 26 Nov 2008 17:15:21 +0100 Pavel Machek wrote:

> Document how to deal with bad memory reported with memtest.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> new file mode 100644
> index 0000000..df84162
> --- /dev/null
> +++ b/Documentation/bad_memory.txt
> @@ -0,0 +1,45 @@
> +March 2008
> +Jan-Simon Moeller, [email protected]
> +
> +
> +How to deal with bad memory e.g. reported by memtest86+ ?
> +#########################################################
> +
> +There are three possibilities I know of:
> +
> +1) Reinsert/swap the memory modules
> +
> +2) Buy new modules (best!) or try to exchange the memory
> + if you have spare-parts
> +
> +3) Use BadRAM or memmap
> +
> +This Howto is about number 3) .

No space between 3) and '.'.

> +
> +
> +BadRAM
> +######
> +BadRAM is the actively developed and available as kernel-patch
> +here: http://rick.vanrein.org/linux/badram/
> +
> +For more details see the BadRAM documentation.
> +
> +memmap
> +######
> +
> +memmap is already in the kernel and usable as kernel-parameter at

a kernel parameter at

> +boot-time. Its syntax is slightly strange and you may need to

boot time.

> +calculate the values by yourself!

s/!/./

> +
> +Syntax to exclude a memory area (see kernel-parameters.txt for details):
> +memmap=<size>$<address>
> +
> +Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and

s/here //

> + some others. All had 0x1869xxxx in common, so I chose a pattern of
> + 0x18690000,0xffff0000.

What is the 0xffff0000 for? Needs explanation.

> +
> +With the numbers of the example above:
> +memmap=64K$0x18690000
> + or
> +memmap=0x10000$0x18690000
> +

Please lose the last empty line.

and thanks for the patch/new file.

---
~Randy

2008-12-09 12:32:21

by Pavel Machek

[permalink] [raw]
Subject: Re: Document hadling of bad memory


I cleaned the document up according to Randy (thanks!). I don't actually know
enough about DRAM error characcteristics, I guess'round the size of
bad region up to nearest 2^n makes sense.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
index df84162..a2a8703 100644
--- a/Documentation/bad_memory.txt
+++ b/Documentation/bad_memory.txt
@@ -14,12 +14,12 @@ There are three possibilities I know of:

3) Use BadRAM or memmap

-This Howto is about number 3) .
+This Howto is about number 3).


BadRAM
######
-BadRAM is the actively developed and available as kernel-patch
+BadRAM is the actively developed and available as a kernel patch
here: http://rick.vanrein.org/linux/badram/

For more details see the BadRAM documentation.
@@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
memmap
######

-memmap is already in the kernel and usable as kernel-parameter at
-boot-time. Its syntax is slightly strange and you may need to
-calculate the values by yourself!
+memmap is already in the kernel and usable as a kernel parameter at
+boot time. Its syntax is slightly strange and you may need to
+calculate the values by yourself.

Syntax to exclude a memory area (see kernel-parameters.txt for details):
memmap=<size>$<address>

-Example: memtest86+ reported here errors at address 0x18691458, 0x18698424 and
+Example: memtest86+ reported errors at address 0x18691458, 0x18698424 and
some others. All had 0x1869xxxx in common, so I chose a pattern of
- 0x18690000,0xffff0000.
+ 0x18690000 and size of 0x10000. (Size needs to cover at least all
+ known bad places, and rounding to nearest power of 2 makes sense
+ 'just to be safe').

With the numbers of the example above:
memmap=64K$0x18690000
or
memmap=0x10000$0x18690000
-

--

(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-12-09 21:40:57

by Rob Landley

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Tuesday 09 December 2008 06:31:52 Pavel Machek wrote:
> I cleaned the document up according to Randy (thanks!). I don't actually
> know enough about DRAM error characcteristics, I guess'round the size of
> bad region up to nearest 2^n makes sense.
>
> Signed-off-by: Pavel Machek <[email protected]>
>
> diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
...
> +This Howto is about number 3).
>
>
> BadRAM
> ######
> -BadRAM is the actively developed and available as kernel-patch
> +BadRAM is the actively developed and available as a kernel patch
> here: http://rick.vanrein.org/linux/badram/

Ok, once again: the point of this patch is to document an out of tree patch.

The out of tree patch is here:
http://rick.vanrein.org/linux/badram/software/BadRAM-2.6.27.1.patch

It has its own Documentation/badram.txt file and it patches
Documentation/memory.txt, as acknowledged here:

> For more details see the BadRAM documentation.
> @@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
> memmap
> ######

Now what I don't understand is, why add something to the tree formalizing the
out-of-tree status of this other patch? Why not just merge it? If it's
interesting enough to have documentation about the patch in the tree, why is
the patch itself not interesting enough to merge? It's clearly got an active
maintainer, and has for years. (Is there something specific about it that
needs to be cleaned up?)

Adding this extra documentation to the badram patch sounds great. Merging the
badram patch into the linux kernel sounds useful; obviously _this_ patch is
inherently an expression of interest in it. Adding documentation about the
badram patch to the linux kernel tree but _not_ adding the badram patch itself
seems kind of crazy.

Would someone please explain the reasoning here? I don't understand it.

Rob

2008-12-09 23:09:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Document hadling of bad memory

On Tue 2008-12-09 15:40:41, Rob Landley wrote:
> On Tuesday 09 December 2008 06:31:52 Pavel Machek wrote:
> > I cleaned the document up according to Randy (thanks!). I don't actually
> > know enough about DRAM error characcteristics, I guess'round the size of
> > bad region up to nearest 2^n makes sense.
> >
> > Signed-off-by: Pavel Machek <[email protected]>
> >
> > diff --git a/Documentation/bad_memory.txt b/Documentation/bad_memory.txt
> ...
> > +This Howto is about number 3).
> >
> >
> > BadRAM
> > ######
> > -BadRAM is the actively developed and available as kernel-patch
> > +BadRAM is the actively developed and available as a kernel patch
> > here: http://rick.vanrein.org/linux/badram/
>
> Ok, once again: the point of this patch is to document an out of tree patch.

No; the point of this piece of documentation is to tell people how to
work _without_ that patch. Because it is simple enough.

> The out of tree patch is here:
> http://rick.vanrein.org/linux/badram/software/BadRAM-2.6.27.1.patch
>
> It has its own Documentation/badram.txt file and it patches
> Documentation/memory.txt, as acknowledged here:
>
> > For more details see the BadRAM documentation.
> > @@ -27,19 +27,20 @@ For more details see the BadRAM documentation.
> > memmap
> > ######
>
> Now what I don't understand is, why add something to the tree formalizing the
> out-of-tree status of this other patch? Why not just merge it? If
> it's

Take a look at that patch. It is seriously overengineered. This should
not need a config option, should not introduce new page flag, etc.

We already have perfectly working interface for excluding specific
addresses; maybe we need better documentation, and maybe kernel
commandline interface should be changed to be more user friendly, but
we certainly don't want to take the badram patch.

This excerpt should be enough:

diff -pruN linux-2.6.27/include/linux/page-flags.h
linux-2.6.27-new/include/linux/page-flags.h
--- linux-2.6.27/include/linux/page-flags.h 2008-10-10
03:43:53.000000000 +0530
+++ linux-2.6.27-new/include/linux/page-flags.h 2008-10-15
10:04:48.000000000 +0530
@@ -93,6 +93,9 @@ enum pageflags {
PG_mappedtodisk, /* Has blocks allocated on-disk */
PG_reclaim, /* To be reclaimed asap */
PG_buddy, /* Page is free, on buddy lists */
+#ifdef CONFIG_BADRAM
+ PG_badram, /* BadRam page */
+#endif
#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
PG_uncached, /* Page has been mapped as uncached */
#


Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html