Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757335Ab1FVLT6 (ORCPT ); Wed, 22 Jun 2011 07:19:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36308 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756840Ab1FVLTz (ORCPT ); Wed, 22 Jun 2011 07:19:55 -0400 From: Stefan Assmann To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, tony.luck@intel.com, andi@firstfloor.org, mingo@elte.hu, hpa@zytor.com, rick@vanrein.org, rdunlap@xenotime.net, sassmann@kpanic.de Subject: [PATCH v2 3/3] Add documentation and credits for BadRAM Date: Wed, 22 Jun 2011 13:18:54 +0200 Message-Id: <1308741534-6846-4-git-send-email-sassmann@kpanic.de> In-Reply-To: <1308741534-6846-1-git-send-email-sassmann@kpanic.de> References: <1308741534-6846-1-git-send-email-sassmann@kpanic.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 17504 Lines: 441 Add Documentation/BadRAM.txt for in-depth information and update Documentation/kernel-parameters.txt. Signed-off-by: Stefan Assmann Acked-by: Tony Luck Acked-by: Andi Kleen --- CREDITS | 9 + Documentation/BadRAM.txt | 370 +++++++++++++++++++++++++++++++++++ Documentation/kernel-parameters.txt | 6 + 3 files changed, 385 insertions(+), 0 deletions(-) create mode 100644 Documentation/BadRAM.txt diff --git a/CREDITS b/CREDITS index dca6abc..d57d4af 100644 --- a/CREDITS +++ b/CREDITS @@ -2899,6 +2899,15 @@ S: 6 Karen Drive S: Malvern, Pennsylvania 19355 S: USA +N: Rick van Rein +E: rick@vanrein.org +W: http://rick.vanrein.org/ +D: Memory, the BadRAM subsystem dealing with defective RAM modules. +S: Haarlebrink 5 +S: 7544 WP Enschede +S: The Netherlands +P: 1024D/89754606 CD46 B5F2 E876 A5EE 9A85 1735 1411 A9C2 8975 4606 + N: Stefan Reinauer E: stepan@linux.de W: http://www.freiburg.linux.de/~stepan/ diff --git a/Documentation/BadRAM.txt b/Documentation/BadRAM.txt new file mode 100644 index 0000000..a98dcd2 --- /dev/null +++ b/Documentation/BadRAM.txt @@ -0,0 +1,370 @@ +INFORMATION ON USING BAD RAM MODULES +==================================== + +The BadRAM feature enables Linux to run on broken memory. The +resulting system will be stable and healthy, because the kernel +simply never allocates the faulty pages for use. This is how +to setup BadRAM if your memory is failing. + + +Introduction +------------ + +As RAM memory grows smaller, it also becomes harder to manufacture +chips that are perfect. Each single cell that is failing could cause +an entire memory module to fail. Even though manufacturers put in +extra cells to replace failed ones, it is still possible that the +sensitive small structures get damaged by an electrical discharge on +their pins. Such damage leads to problems in fixed locations of +the address space of a memory module, which is what theory predicts +and has been confirmed by years of experience with bad memory. + +It is not necessary for such a memory module to be discarded. All +pages of memory behave the same, and if only we skip the failing +pages we can continue to use the module for many more years. The +operating system kernel simply has to avoid using the blocks that +are damaged. This is easy to do in the part of the kernel where +memory pages are allocated. + + +Reasons for using BadRAM +------------------------ + +Chip manufacturing processes use lots of harsh chemicals, and the less +of these used, the better. Being able to make good use of partially +failed memory chips means that far less of those chemicals are needed +to provide storage. This reduces expenses and it is lighter on the +environment in which we live. + +This kernel feature clearly shows that Linux is "the flexible OS". +If something does not work, fix it. Also, share it with all the +others who could use it. After more than a decade of BadRAM, +the response has been purely positive, because it has helped real +people to solve real problems. + +One important use for this feature is with laptops that have their +memory soldered in. Such laptops would have to be discarded as a +whole, but with BadRAM in place they can continue to be used +without further restrictions. + +Finally, running a system on broken memory is just plain cool ;-) + + +Running example +--------------- + +To run this project, I was given two DIMMs, 32 MB each. One, that we +shall use as a running example in this text, contained 512 faulty bits, +spread over 1/4 of the address range in a regular pattern. This looks +a lot like the fauly pattern that many others have reported; the only +common other pattern is a single faulty spot. With such memory, a few +tricks with a thorough RAM tester and some binary calculations suffice +to write these fault patterns down in 2 longword numbers. The format +of these is hexadecimal, which is a condensed way of writing down the +binary patterns that make the hardware patterns recognisable. + +After being patched and invoked with the properly formatted description, +the kernel held back only the memory pages with faults, and never handed +them out for allocation. The allocation routines could therefore +progress as normally, without any adaptation. This is important, since +all the work is done at booting time. After booting, the kernel does +not have to do spend any time to implement BadRAM. + +As a result of this initial exercise, I gained 30 MB out of the 32 MB +DIMM that would otherwise have been thrown away. Of course, these +numbers scale up with larger memory modules, but the principle is +the same. + + +The structure of memory failures +-------------------------------- + +Memory chips are usually laid out in a roughly equal number of rows +and columns, making it a square of cells that each store one bit. +When addressing a bit, the processor sends the row and column in +separate phases, and then reads or writes its value. The rows and +columns are therefore visible on the outside of a chip. + +The connections of row and column lines to the outside world is +usually protected by a buffer. It can happen that a static +discharge damages such a buffer, causing an entire row or an +entire column to fail. This means that a series of bits become +unusable in a single page or in a regular pattern of pages, +depending on whether it was a row or column that got damaged. + +For this reason, BadRAM was designed to describe memory faults +in a pattern of address/mask pairs. An address locates an +error and a zero on the corresponding position in the mask +defines which bits in the address may be replaced with any +other value. This has shown to work as a tight description +of error patterns: it is very compact, but does not waste pages +that are good. + + +BadRAM's notation for memory faults +----------------------------------- + +Instead of manually providing all 512 errors in the running example +to the kernel, it's easier to use a pattern notation. Since the +regularity is based on address decoding software, which generally +takes certain bits into account and ignores others, we shall +provide a faulty address F, together with a bit mask M that +specifies which bits must be equal to F. In C code, an address A +is faulty if and only if + + (F & M) == (A & M) + +or alternately (closer to a hardware implementation): + + ~((F ^ A) & M) + +In the example 32 MB chip, I had the faulty addresses in 8MB-16MB: + + xxx42f4 ....0100.... + xxx62f4 ....0110.... + xxxc2f4 ....1100.... + xxxe2f4 ....1110.... + +The second column represents the alternating hex digit in binary form. +Apparently, the first and next to last binary digit can be anything, +so the binary mask for that part is 0101. The mask for the part after +this is 0xfff, and the part before should select anything in the range +8MB-16MB, or 0x00800000-0x01000000; this is done with a bitmask +0xff80xxxx. Combining these partial masks, we get: + + F=0x008042f4 M=0xff805fff + +That covers every fault in this DIMM; for more complicated failing +DIMMs, or for a combination of multiple failing DIMMs, it can be +necessary to set up a number of such F/M pairs. + + +Getting started +--------------- + +If you experience RAM trouble, first read Documentation/memory.txt +and try out the mem=4M trick to see if at least some initial parts +of your RAM work well. Note that 4 MB will not be able to hold a +modern desktop, so if you rely on that you would have to set the +limit higher (and accept that your sanity check is not as tight as +possible). + +The BadRAM routines halt the kernel in panic if the reserved area +of memory (containing kernel stuff) contains a faulty address. It +will only do that when supplied with the patterns below; this +initial check is merely to see if this is likely to happen. + + +Running a memory checker +------------------------ + +There is no memory checker built into the kernel, to avoid delays +at runtime or while booting. If you experience problems that may +be caused by RAM, run a good outside RAM checker. The Memtest86 +checker is a popular, free, high-quality checker. Many Linux +distributions include it as an alternate boot option, so you may +simply find it in your boot loader's boot menu. + + +The memory checker lists all addresses that have a fault. It will +do this for a given configuration of the DIMMs in your motherboard; +if you replace or move memory modules you may find other addresses. +In the running example's 32 MB chip, with the DIMM in slot #0 on +the motherboard, the errors were found in the 8MB-16MB range: + + xxx42f4 + xxx62f4 + xxxc2f4 + xxxe2f4 + +The error reported was a "sticky 1 bit", a memory bit that always +reads as "1" even if a "0" was just written to it. This is +probably caused by a damaged buffer on one of the rows or columns +in one of the memory chips. + +It would be a lot of work to collect the individual errors and +condense them into a pattern. That is why I patched the +Memtest86 (v2.3+) checker to directly print out the address/mask +pairs that are used by this kernel feature. All you would do is +select the BadRAM printout option at the start of the scan, and +then leave it running for hours and hours, until it has made at +least one pass. The patterns are printed each time a bit is +added, but each line contains all faults found up to that point, +so you would write down the last set of patterns printed, and +supply that as a boot option in your next run of a +BadRAM-capable Linux kernel. + +If you use this patch on an x86_64 architecture, your addresses are +twice as long. Fill up with zeroes in the address and with f's in +the mask. The latter example would thus become: + + mem=24M badram=0x0000000000f00000,0xfffffffffff00000 + +The patch applies the changes to both x86 and x86_64 code bases +at the same time. Patching but not compiling maps the entire +source tree at once, which makes more sense than splitting the +patch into an x86 and x86_64 branch, because those two branches +could not be applied at the same time because they would overlap. + + +Rebooting Linux +--------------- + +Once the fault patterns are known we simply restart Linux with +these F/M pairs as a parameter. If your normal boot options look +like + + root=/dev/sda1 ro + +you should now boot with options + + root=/dev/sda1 ro badram=0x008042f4,0xff805fff + +or perhaps by mentioning more F/M pairs in an order F0,M0,F1,M1,... +When you provide an odd number of arguments to badram, the default +mask 0xffffffff (meaning that only one address is matched) is +applied to the last address. + +If your bootloader is GRUB, you can supply this additional +parameter interactively during boot. This way, you can try them +before you edit /boot/grub/grub.conf to put them in forever. + +When the kernel now boots, it should not give any trouble with RAM. +Mind you, this is under the assumption that the kernel and its data +storage do not overlap an erroneous part. If they do, and the +kernel does not choke on it right away, BadRAM itself will stop the +system with a kernel panic. When the error is that low in memory, +you will need additional bootloader magic, to load the kernel at an +alternative address. + +Now look up your memory status with + + cat /proc/meminfo |grep HardwareCorrupted + +which prints a single line with information like + +HardwareCorrupted: 2048 kB + +The entry HardwareCorrupted: 2048k represents the loss of 2MB +of general purpose RAM due to the errors. Or, positively rephrased, +instead of throwing out 32MB as useless, you only throw out 2MB. +Note that 2048 kB equals 512 pages of 4kB. The size of a page is +defined by the processor architecture. + +If the system is stable (which you can test by compiling a few +kernels, and a few file finds in / or so) you can decide to add +the boot parameter to /boot/grub/grub.conf, in addition to any +other boot parameters that may already be there. For example, + + kernel /boot/vmlinuz root=/dev/sda1 ro + +would become + + kernel /boot/vmlinuz root=/dev/sda1 ro badram=0x008042f4,0xff805fff + +Depending on how helpful your Linux distribution is, you may +have to add this feature again after upgrading your kernel. If +your boot loader is GRUB, you can always do this manually if you +rebooted before you remembered to make that adaptation. + + +BadRAM classification +--------------------- + +This technique might start a lively market for "dead" RAM. It is +important to realise that some RAMs are more dead than others. So, +instead of just providing a RAM size, it is also important to know +the BadRAM class, which is defined as follows: + + A BadRAM class N means that at most 2^N bytes have a problem, + and that all problems with the RAMs are persistent: They + are predictable and always show up. + +The DIMM that serves as an example here was of class 9, since 512=2^9 +errors were found. Higher classes are worse, "correct" RAM is of class +-1 (or even less, at your choice). +Class N also means that the bitmask for your chip (if there's just one, +that is) counts N bits "0" and it means that (if no faults fall in the +same page) an amount of 2^N*PAGESIZE memory is lost, in the example on +an x86 architecture that would be 2^9*4k=2MB, which accounts for the +initial claim of 30MB RAM gained with this DIMM. + +Note that this scheme has deliberately been defined to be independent +of memory technology and of computer architecture. + + +Further Possibilities +--------------------- + +**Slab allocation support** + +It would be possible to use even more of the faulty RAMs by employing +them for slabs. The smaller allocation granularity of slabs makes it +possible to throw out just, say, 32 bytes surrounding an error. This +would mean that the example DIMM only caused a loss of 16kB instead +of 2MB, or scaled-up similar values for larger memory sizes. One +specific area that could benefit from this is the growing market +for embedded devices, which usually wants to meet tight budgets. + +It should be possible to make the slab allocator prefer pages with +broken memory, and allocate the faulty places in memory before the +other slabs are made available to the kernel. In the best possible +situation, this could reduce the loss of good RAM cells to zero! + +**Support for low-memory errors** + +To the best of my knowledge, boot loaders like GRUB cannot load +the Linux kernel in non-standard locations. This means that any +errors at low memory locations cannot be overcome with BadRAM. + +Anything that physically alters the memory layout can be used +to overcome such problems; this may be achieved through BIOS +settings, or by adding or swapping memory modules. + +A general solution could be to use a boot loader that can load +the Linux kernel (and its initial memory allocation) at other +memory addresses than are standard. + + +**Boot-time memory checking** + +Many suggestions have been made to insert a RAM checker at boot time; +since this would leave the time to do only very meager checking, it +is not a reasonable option; we already have a half-done BIOS check +doing that! + +**ECC RAM integration** + +It would be interesting to integrate this functionality with the +self-verifying nature of ECC RAM. These memories can even distinguish +between recoverable and unrecoverable errors! Such memory has been +handled in older operating systems by `testing' once-failed memory +blocks for a while, by placing only (reloadable) program code in it. + +I possess no faulty ECC modules to work this out, and there is no +general use for it either. + + +Names and Places +---------------- + +The home page of this project is on + http://rick.vanrein.org/linux/badram +This page also links to Nico Schmoigl's experimental extensions to +this patch (with debugging and a few other fancy things). + +In case you have experiences with the BadRAM software which differ from +the test reportings on that site, I hope you will mail me with that +new information. + +The BadRAM project is an idea and implementation by + Rick van Rein + Haarlebrink 5 + 7544 WP Enschede + The Netherlands + rick@vanrein.org +If you like it, a postcard would be much appreciated ;-) + + + Enjoy, + -Rick. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index cc85a92..a12b27a 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -51,6 +51,7 @@ parameter is applicable: FB The frame buffer device is enabled. GCOV GCOV profiling is enabled. HW Appropriate hardware is enabled. + HWPOISON HWPOISON support is enabled. IA-64 IA-64 architecture is enabled. IMA Integrity measurement architecture is enabled. IOSCHED More than one I/O scheduler is enabled. @@ -373,6 +374,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. autotest [IA64] + badram= [HWPOISON] + Allows memory areas to be flagged as HWPOISON. + Format: ,[,...] + See Documentation/BadRAM.txt + baycom_epp= [HW,AX25] Format: , -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/