Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752543AbYLYSEV (ORCPT ); Thu, 25 Dec 2008 13:04:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751950AbYLYSEM (ORCPT ); Thu, 25 Dec 2008 13:04:12 -0500 Received: from lucidpixels.com ([75.144.35.66]:44450 "EHLO lucidpixels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751595AbYLYSEJ (ORCPT ); Thu, 25 Dec 2008 13:04:09 -0500 Date: Thu, 25 Dec 2008 13:04:07 -0500 (EST) From: Justin Piszcz To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org cc: xfs@oss.sgi.com, smartmontools-support@lists.sourceforge.net, Alan Piszcz Subject: mismatch_cnt, random bitflips, silent corruption(?), mdadm/sw raid[156] Message-ID: User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-1463747160-1756203312-1230208415=:24295" Content-ID: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 17771 Lines: 434 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463747160-1756203312-1230208415=:24295 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1; FORMAT=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Content-ID: I have many backups of data from the late 1990s onward, each of which have= =20 been combined into tar files and then encrypted with gpg, each is about 4= =20 gigabytes so I can store them on DVD. For quicker access, I also kept them on software raid. In the past, I have always kept them on either SW= =20 RAID5 or SW RAID6 using XFS. All of the machines in question that I=20 have/had the data on never exhibited any bad memory and all pass memtest=20 OK. However, what is worrisome, is, on occasion, the mismatch_cnt=20 would be some crazy number on different machines, 2000 or 5000, I checked= =20 all of the disks with smart, they appear to be OK and when I run a check=20 and repair against the raid then the count will usually go back down to=20 zero. However, in reality, are these "mismatches" really an indication of silent= =20 data corruption? In my case, I have 199 of these backup files and when I= =20 tried to unencrypt the data, I got these errors on several volumes: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D gpg: block_filter 0x1c8e200: read error (size=3D10266,a->size=3D10266) gpg: mdc_packet with invalid encoding gpg: decryption failed: invalid packet gpg: block_filter: pending bytes! gpg: fatal: zlib inflate problem: invalid stored block lengths secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid block type secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid code lengths set secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid block type secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid block type secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: [don't know]: invalid packet (ctb=3D1f) gpg: [don't know]: invalid packet (ctb=3D2d) gpg: mdc_packet with invalid encoding gpg: decryption failed: invalid packet gpg: [don't know]: invalid packet (ctb=3D30) gpg: fatal: zlib inflate problem: invalid stored block lengths secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid stored block lengths secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid code lengths set secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid stored block lengths secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid distance code secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid distance code secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid code lengths set secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 gpg: fatal: zlib inflate problem: invalid block type secmem usage: 2368/2496 bytes in 6/7 blocks of pool 3104/32768 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This resulted in partial-sized corrupted tarballs, I am decompressing each of the 199 backups now to isolate which are affected so I can come up with a better count on how many files were affected. The total amount of data is:=20 832G in 199 files (4.18gig/file average) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Bad files (out of the 199): Thu Dec 25 05:44:16 EST 2008: Decompressing backup038.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 08:18:00 EST 2008: Decompressing backup103.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 08:34:39 EST 2008: Decompressing backup111.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 08:43:26 EST 2008: Decompressing backup115.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 08:54:39 EST 2008: Decompressing backup120.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 09:36:02 EST 2008: Decompressing backup137.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 09:36:39 EST 2008: Decompressing backup138.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 09:52:06 EST 2008: Decompressing backup145.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 10:10:14 EST 2008: Decompressing backup153.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 10:10:32 EST 2008: Decompressing backup154.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 10:36:50 EST 2008: Decompressing backup166.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 10:40:19 EST 2008: Decompressing backup168.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 11:20:00 EST 2008: Decompressing backup181.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 11:39:11 EST 2008: Decompressing backup187.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Thu Dec 25 12:05:27 EST 2008: Decompressing backup194.tar tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Example of one file: -rw-r--r-- 1 root root 40M 2008-12-24 20:08 backup038.tar -rw-r--r-- 1 root root 4.3G 2008-12-23 17:45 backup038.tar.gpg =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D I have restored the data off of DVD and confirmed the data that had=20 resided on disk must have suffered from a bit-flip/corruption: $ md5sum backup038.tar.gpg=20 9958813aa22e4307f2101f87b8820bff backup038.tar.gpg (from dvd) $ md5sum backup038.tar.gpg 9437138a01fc2429ca2131f6a10295b5 backup038.tar.gpg (from disk) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Splitting these files up we can isolate how many corruptions occured in the file: dvd$ split -b 100M backup038.tar.gpg hdd$ split -b 100M backup038.tar.gpg $ diff dvd.md5 ../hdd/hdd.md5=20 1c1 < 9dfcab90fd6590705ba4293994f10a85 xaa --- > cce57047ac869ef3e603d4d6bd3579e9 xaa =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Dig further: $ mv xaa dvd.xaa $ mv xaa hdd.xaa $ split -b 1M dvd.xaa=20 $ split -b 1M hdd.xaa $ diff dvd.md5 ../../hdd/xaa_split/hdd.md5=20 40c40 < 439e4291ea67240c0b083abe5f041319 xbn --- > 95e548284fa78f750b50f3e6a8e1b847 xbn =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Further.. $ split -a 100 -b 1k dvd.xbn=20 $ split -a 100 -b 1k hdd.xbn $ diff dvd.md5 ../../../hdd/xaa_split/xbn_split/hdd.md5 |wc 132 392 17960 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Quite a big 'chunk' of the file suffered corruption. $ diff dvd.md5 ../../../hdd/xaa_split/xbn_split/hdd.md5 | awk '{print $3}'= | xargs -n1 ls | xargs du -ach 520K total =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D dvd$ md5sum one_part_with_corruption c131c2f7dea3c316cbcd2c26360e4b03 one_part_with_corruption hdd$ md5sum one_part_with_corruption 35968f907c5e6f986d9c5a171dd0a7ac one_part_with_corruption Taking an octal dump and checking the ASCII characters: dvd: < 0000000 034 =E5 =E2 201 f =AE =E7 =B9 e D 221 =CA =C1= \b =F7 $ < 0000020 > p M | 212 236 034 =D8 \b =FF z =E5 =ED \a = =C8 i < 0000040 212 \f =E7 ] 200 l =AA 234 Y 224 =B3 =E9 =FB = =F3 =CC=20 < 0000060 233 } G =E4 W : 236 b =B0 } 215 ) 224 R R = =B8 < 0000100 237 =A8 =BE 227 =DA 1 =A4 O =E7 =FD 003 177 $ = e 032 =C1 < 0000120 ! ^ j =DE 235 8 5 a J =A1 =F6 < =C6 H 2= 23 =D5 hdd: > 0000000 \r =C6 =A6 033 034 =E5 =E2 201 f =AE =E7 =B9 e= D 221 =CA > 0000020 =C1 \b =F7 $ > p M | 212 236 034 =D8 \b =FF = z =E5 > 0000040 =ED \a =C8 i 212 \f =E7 ] 200 l =AA 234 Y 224 = =B3 =E9 > 0000060 =FB =F3 =CC 233 } G =E4 W : 236 b =B0 }= 215 ) > 0000100 224 R R =B8 237 =A8 =BE 227 =DA 1 =A4 O =E7 = =FD 003 177 > 0000120 $ e 032 =C1 ! ^ j =DE 235 8 5 a J =A1 = =F6 < The hdd image seems 'shifted' see the following sequence: (201..f..R..c) on the first line. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D So the question then becomes where did the shift begin? Way above we see the initial error began on xaa (100M) -> xbn (1M) dvd$ md5sum xbm 2524788769acf2b0f98346f488a58441 xbm hdd$ md5sum xbm 2524788769acf2b0f98346f488a58441 xbm So then we want to take an od of xbn and then diff that: It starts at the following segment: dvd: < 2006000 =FB r 222 235 =A9 z 8 d 035 =B4 230 # 207 =CD = =F5 u hdd: > 2006000 =FB r 222 235 =FB r 222 235 =A9 z 8 d 035 =B4 2= 30 # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D So let's look at it a few lines before as well: dvd: 2005720 ' 0 =C3 001 =F2 =BB [ ( 037 024 . 211 M 023 001 = =C3 2005740 V 021 \ =D0 Y =C7 =D6 207 206 =AC n =AC 234 z = =AF 034 2005760 =BF 206 002 + w =CE 200 =CE 3 =E8 =C0 K $ =EA= e n 2006000 =FB r 222 235 =A9 z 8 d 035 =B4 230 # 207 =CD = =F5 u 2006020 R =AF =CD 025 =C1 =BC 022 =F6 024 =AF 200 Q 217 = =C3 =AF =A3 2006040 203 =AD =F4 c 3 =ED 207 # ] =C4 =F4 H ( =FD= 5 207 2006060 =EB C C a @ \r 200 =DC =E2 =A1 \r 204 230 U = =E6 O hdd: 2005720 ' 0 =C3 001 =F2 =BB [ ( 037 024 . 211 M 023 001 = =C3 2005740 V 021 \ =D0 Y =C7 =D6 207 206 =AC n =AC 234 z = =AF 034 2005760 =BF 206 002 + w =CE 200 =CE 3 =E8 =C0 K $ =EA= e n 2006000 =FB r 222 235 =FB r 222 235 =A9 z 8 d 035 =B4 230= # 2006020 207 =CD =F5 u R =AF =CD 025 =C1 =BC 022 =F6 024 = =AF 200 Q 2006040 217 =C3 =AF =A3 203 =AD =F4 c 3 =ED 207 # ] = =C4 =F4 H 2006060 ( =FD 5 207 =EB C C a @ \r 200 =DC =E2 =A1 = \r 204 The same sequence still seems to exist after the flip for the 5th character on line 2006000. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On the linux-raid mailing list, this issue is often discussed: http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07541.html The solution seems to be repair and keep on trucking along, which is=20 probably fine for a filesystem with hundreds/thousands of files, most=20 likely, you would never notice the corruption. However, with gpg, the file= s obviously cannot suffer any corruption. It is only when I tried to extract = the=20 data did I notice this problem. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D server 1 (965 chipset) =3D=3D Root filesystem (Raid-1): Sat Nov 15 11:39:03 EST 2008: cat /sys/block/md2/md/mismatch_cnt Sat Nov 15 11:39:03 EST 2008: 128 Storage array (Raid-6): Wed Nov 26 10:14:21 EST 2008: cat /sys/block/md3/md/mismatch_cnt Wed Nov 26 10:14:21 EST 2008: 1208 =3D=3D server 2 (p35-chipset) =3D=3D Swap (Raid-1): I understand this is 'normal' for swap: Fri Jan 18 22:51:47 EST 2008: cat /sys/block/md0/md/mismatch_cnt Fri Jan 18 22:51:47 EST 2008: 4096 Fri May 30 21:00:07 EDT 2008: cat /sys/block/md0/md/mismatch_cnt Fri May 30 21:00:07 EDT 2008: 896 Fri Nov 14 23:30:10 EST 2008: cat /sys/block/md0/md/mismatch_cnt Fri Nov 14 23:30:10 EST 2008: 384 =3D=3D server 3 (also 965 chipset) =3D=3D Root filesystem (Raid-1): Fri May 18 20:50:05 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Fri May 18 20:50:05 EDT 2007: 128 Sat May 26 04:40:09 EDT 2007: cat /sys/block/md2/md/mismatch_cnt Sat May 26 04:40:09 EDT 2007: 128 Swap (Raid-1): Again, normal but a very high mismatch_cnt: Fri Oct 5 20:50:05 EDT 2007: cat /sys/block/md0/md/mismatch_cnt Fri Oct 5 20:50:05 EDT 2007: 27904 Storage array (RAID-5) Fri Apr 4 22:00:09 EDT 2008: cat /sys/block/md3/md/mismatch_cnt Fri Apr 4 22:00:09 EDT 2008: 512 Fri May 23 22:00:10 EDT 2008: cat /sys/block/md3/md/mismatch_cnt Fri May 23 22:00:10 EDT 2008: 256 =3D=3D server 4 (intel 875 chipset, raid-5) # grep mismatches\ found * daemon.log:Jan 3 11:27:27 p26 mdadm: RebuildFinished event detected on md = device /dev/md0, component device mismatches found: 11896 daemon.log:Nov 14 12:25:59 p26 mdadm[1956]: RebuildFinished event detected = on md device /dev/md0, component device mismatches found: 11664 daemon.log:Nov 14 18:19:01 p26 mdadm[1956]: RebuildFinished event detected = on md device /dev/md0, component device mismatches found: 11664 syslog:Jan 3 11:27:27 p26 mdadm: RebuildFinished event detected on md devi= ce /dev/md0, component device mismatches found: 11896 syslog:Nov 14 12:25:59 p26 mdadm[1956]: RebuildFinished event detected on m= d device /dev/md0, component device mismatches found: 11664 syslog:Nov 14 18:19:01 p26 mdadm[1956]: RebuildFinished event detected on m= d device /dev/md0, component device mismatches found: 11664 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Options/alternatives for maintaining data integrity? 1. Obviously DVD and/or LTO-tape (multiple copies of data at rest) http://epoka.dk/media/Linear_Tape_Open_(LTO)_Ultrium_Data_Cartridges.pdf -> For LTO-2/LTO-3 -> Uncorrected bit error rate 1x10-17 1x10-17 -> Undetected bit error rate 1x10-27 1x10-27 2. ZFS, but only runs in user-space in Linux. 3. Keep an md5sum for each file on the filesystem and run daily checks? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Other options? How do others maintain data integrity? Just not worry about it until you h= ave to, rely on backups.. or? Justin. ---1463747160-1756203312-1230208415=:24295-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/