From: Andreas Dilger Subject: Re: [PATCH][18/28] e2fsprogs-tests-f_random_corruption.patch Date: Sat, 02 Feb 2008 01:46:27 -0700 Message-ID: <20080202084627.GS31694@webber.adilger.int> References: <20080202075943.GB23836@webber.adilger.int> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7BIT To: "Theodore Ts'o" , linux-ext4@vger.kernel.org Return-path: Received: from sca-es-mail-2.Sun.COM ([192.18.43.133]:58460 "EHLO sca-es-mail-2.sun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757395AbYBBIqb (ORCPT ); Sat, 2 Feb 2008 03:46:31 -0500 Received: from fe-sfbay-09.sun.com ([192.18.43.129]) by sca-es-mail-2.sun.com (8.13.7+Sun/8.12.9) with ESMTP id m128kUhX016656 for ; Sat, 2 Feb 2008 00:46:30 -0800 (PST) Received: from conversion-daemon.fe-sfbay-09.sun.com by fe-sfbay-09.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) id <0JVL00401S3NBC00@fe-sfbay-09.sun.com> (original mail from adilger@sun.com) for linux-ext4@vger.kernel.org; Sat, 02 Feb 2008 00:46:30 -0800 (PST) In-reply-to: <20080202075943.GB23836@webber.adilger.int> Content-disposition: inline Sender: linux-ext4-owner@vger.kernel.org List-ID: The f_random_corruption test enables a random subset of filesystem features, picks one of the valid filesystem block and inode sizes, and a random device size and creates a new filesystem with those parameters. It is possible to disable the running of the test by setting the environment variable F_RANDOM_CORRUPTION=skip. By default the test script is run only one time, but setting the LOOP_COUNT variable allows the test to run multiple times. If the script is running as root the filesystem is mounted and populated with file data to allow a more useful test filesystem to be generated. In some cases the kernel may not support the requested filesystem features and the filesystem cannot be mounted. This is not considered a test failure. The resulting filesystem is corrupted with both random data and by shifting data from one part of the device to another and then repaired by e2fsck. In some rare cases the random corruption is severe enough that the filesystem is not recoverable (e.g. small filesystem with no backup superblock has bad superblock corruption) but in most cases "e2fsck -fy" should be able to fix all errors in some way. After e2fsck has repaired the filesystem, it is optionally mounted (if the environment variable MOUNT_AFTER_CORRUPTION=yes is set) and the test files created in the filesystem are deleted. This verifies that the fixes in the filesystem are sufficient for the kernel to use the filesystem without error. Since there is some possibility of the kernel oopsing if there is a filesystem bug, this part of the test is not enabled by default. Signed-off-by: Andreas Dilger Signed-off-by: Kalpak Shah Index: e2fsprogs-1.40.4/tests/f_random_corruption/script =================================================================== --- /dev/null +++ e2fsprogs-1.40.4/tests/f_random_corruption/script @@ -0,0 +1,277 @@ +# This is to make sure that if this test fails other tests can still be run +# instead of doing an exit. We break before the end of the loop. +export LOOP_COUNT=${LOOP_COUNT:-1} +export COUNT=0 + +while [ $COUNT -lt $LOOP_COUNT ]; do +[ "$F_RANDOM_CORRUPTION" = "skip" ] && echo "skipped" && break + +# choose block and inode sizes randomly +BLK_SIZES=(1024 2048 4096) +INODE_SIZES=(128 256 512 1024) + +SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }') +RANDOM=$SEED + +IMAGE=${IMAGE:-$TMPFILE} +DATE=`date '+%Y%m%d%H%M%S'` +ARCHIVE=$IMAGE.$DATE +SIZE=${SIZE:-$(( 192000 + RANDOM + RANDOM )) } +FS_TYPE=${FS_TYPE:-ext3} +BLK_SIZE=${BLK_SIZES[(( $RANDOM % ${#BLK_SIZES[*]} ))]} +INODE_SIZE=${INODE_SIZES[(( $RANDOM % ${#INODE_SIZES[*]} ))]} +DEF_FEATURES="sparse_super,filetype,dir_index" +FEATURES=${FEATURES:-$DEF_FEATURES} +MOUNT_OPTS="-o loop" +MNTPT=$test_dir/temp +OUT=$test_name.log +FAILED=$test_name.failed +OKFILE=$test_name.ok + +# Do you want to try and mount the filesystem? +MOUNT_AFTER_CORRUPTION=${MOUNT_AFTER_CORRUPTION:-"no"} +# Do you want to remove the files from the mounted filesystem? +# Ideally use it only in test environment. +REMOVE_FILES=${REMOVE_FILES:-"no"} + +# In KB +CORRUPTION_SIZE=${CORRUPTION_SIZE:-64} +CORRUPTION_ITERATIONS=${CORRUPTION_ITERATIONS:-5} + +MKFS=../misc/mke2fs +E2FSCK=../e2fsck/e2fsck +FIRST_FSCK_OPTS="-fyv" +SECOND_FSCK_OPTS="-fyv" + +# Lets check if the image can fit in the current filesystem. +BASE_DIR=`dirname $IMAGE` +BASE_AVAIL_BLOCKS=`df -P -k $BASE_DIR | awk '/%/ { print $4 }'` + +if (( BASE_AVAIL_BLOCKS < NUM_BLKS * (BLK_SIZE / 1024) )); then + echo "$BASE_DIR does not have enough space to accomodate test image." + echo "Skipping test...." + break; +fi + +# Lets have a journal more times than not. +HAVE_JOURNAL=$((RANDOM % 12 )) +if (( HAVE_JOURNAL == 0 )); then + FS_TYPE="ext2" + HAVE_JOURNAL="" +else + HAVE_JOURNAL="-j" +fi + +# Experimental features should not be used too often. +LAZY_BG=$(( $RANDOM % 12 )) +if (( LAZY_BG == 0 )); then + FEATURES=$FEATURES,lazy_bg +fi + +# meta_bg and resize_inode features should not be enabled simultaneously +META_BG=$(( $RANDOM % 12 )) +if (( META_BG == 0 )); then + FEATURES=$FEATURES,meta_bg +else + FEATURES=$FEATURES,resize_inode +fi + +modprobe ext4 2> /dev/null +modprobe ext4dev 2> /dev/null + +# If ext4 is present in the kernel then we can play with ext4 options +EXT4=`grep ext4 /proc/filesystems` +if [ -n "$EXT4" ]; then + USE_EXT4=$((RANDOM % 2 )) + if (( USE_EXT4 == 1 )); then + FS_TYPE="ext4dev" + fi +fi + +if [ "$FS_TYPE" = "ext4dev" ]; then + UNINIT_GROUPS=$((RANDOM % 12 )) + if (( UNINIT_GROUPS == 0 )); then + FEATURES=$FEATURES,uninit_groups + fi + EXPAND_ESIZE=$((RANDOM % 12 )) + if (( EXPAND_EISIZE == 0 )); then + FIRST_FSCK_OPTS=$FIRST_FSCK_OPTS," -E expand_extra_isize" + fi +fi + +MKFS_OPTS="$HAVE_JOURNAL -b $BLK_SIZE -I $INODE_SIZE -O $FEATURES" + +NUM_BLKS=$(((SIZE * 1024) / BLK_SIZE)) + +log() +{ + [ "$VERBOSE" ] && echo "$*" + echo "$*" >> $OUT +} + +error() +{ + log "$*" + echo "$*" >> $FAILED +} + +unset_vars() +{ + unset IMAGE DATE ARCHIVE FS_TYPE SIZE BLK_SIZE MKFS_OPTS MOUNT_OPTS + unset E2FSCK FIRST_FSCK_OPTS SECOND_FSCK_OPTS OUT FAILED OKFILE +} + +cleanup() +{ + [ "$1" ] && error "$*" || error "Error occured..." + umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT + cp $OUT $OUT.$DATE + echo " failed" + echo "*** This appears to be a bug in e2fsprogs ***" + echo "Please contact linux-ext4@vger.kernel.org for further assistance." + echo "Include $OUT.$DATE, and save $ARCHIVE locally for reference." + unset_vars + break; +} + +echo -n "Random corruption test for e2fsck:" +# Truncate the output log file +rm -f $FAILED $OKFILE +> $OUT + +get_random_location() +{ + total=$1 + + tmp=$(((RANDOM * 32768) % total)) + + # Try and have more corruption in metadata at the start of the + # filesystem. + if ((tmp % 3 == 0 || tmp % 5 == 0 || tmp % 7 == 0)); then + tmp=$((tmp % 32768)) + fi + + echo $tmp +} + +make_fs_dirty() +{ + from=`get_random_location $NUM_BLKS` + + # Number of blocks to write garbage into should be within fs and should + # not be too many. + num_blks_to_dirty=$((RANDOM % $1)) + + # write garbage into the selected blocks + [ ! -c /dev/urandom ] && return + log "writing ${num_blks_to_dirty}kB random garbage at offset ${from}kB" + dd if=/dev/urandom of=$IMAGE bs=1kB seek=$from conv=notrunc \ + count=$num_blks_to_dirty >> $OUT 2>&1 +} + + +touch $IMAGE +log "Format the filesystem image..." +log +# Write some garbage blocks into the filesystem to make sure e2fsck has to do +# a more difficult job than checking blocks of zeroes. +log "Copy some random data into filesystem image...." +make_fs_dirty 32768 +log "$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT" +$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT 2>&1 +if [ $? -ne 0 ]; then + zero_size=`grep "Device size reported to be zero" $OUT` + short_write=`grep "Attempt to write block from filesystem resulted in short write" $OUT` + + if (( zero_size != 0 || short_write != 0 )); then + echo "mkfs failed due to device size of 0 or a short write. This is harmless and need not be reported." + else + cleanup "mkfs failed - internal error during operation. Aborting random regression test..." + fi +fi + +if [ `id -u` = 0 ]; then + mkdir -p $MNTPT + if [ $? -ne 0 ]; then + log "Failed to create or find mountpoint...." + else + mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT + if [ $? -ne 0 ]; then + log "Unable to mount file system - skipped" + else + df -h $MNTPT >> $OUT + df -i $MNTPT >> $OUT + log "Copying data into the test filesystem..." + + cp -r ../ $MNTPT >> $OUT 2>&1 + sync + umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT + fi + fi +else + log "skipping mount test for non-root user" +fi + +log "Corrupt the image by moving around blocks of data..." +log +for (( i = 0; i < $CORRUPTION_ITERATIONS; i++ )); do + from=`get_random_location $NUM_BLKS` + to=`get_random_location $NUM_BLKS` + + log "Moving ${CORRUPTION_SIZE}kB from block ${from}kB to ${to}kB" + dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=$from seek=$to >> $OUT 2>&1 + + # more corruption by overwriting blocks from within the filesystem. + make_fs_dirty $CORRUPTION_SIZE +done + +# Copy the image for reproducing the bug. +cp --sparse=always $IMAGE $ARCHIVE >> $OUT 2>&1 + +log "First pass of fsck..." +$E2FSCK $FIRST_FSCK_OPTS $IMAGE >> $OUT 2>&1 +RET=$? + +# Run e2fsck for the second time and check if the problem gets solved. +# After we can report error with pass1. +[ $((RET & 1)) == 0 ] || log "The first fsck corrected errors" +[ $((RET & 2)) == 0 ] || error "The first fsck wants a reboot" +[ $((RET & 4)) == 0 ] || error "The first fsck left uncorrected errors" +[ $((RET & 8)) == 0 ] || error "The first fsck reports an operational error" +[ $((RET & 16)) == 0 ] || error "The first fsck reports there was a usage error" +[ $((RET & 32)) == 0 ] || error "The first fsck reports it was cancelled" +[ $((RET & 128)) == 0 ] || error "The first fsck reports a library error" + +log "---------------------------------------------------------" + +log "Second pass of fsck..." +$E2FSCK $SECOND_FSCK_OPTS $IMAGE >> $OUT 2>&1 +RET=$? +[ $((RET & 1)) == 0 ] || cleanup "The second fsck corrected errors!" +[ $((RET & 2)) == 0 ] || cleanup "The second fsck wants a reboot" +[ $((RET & 4)) == 0 ] || cleanup "The second fsck left uncorrected errors" +[ $((RET & 8)) == 0 ] || cleanup "The second fsck reports an operational error" +[ $((RET & 16)) == 0 ] || cleanup "The second fsck reports a usage error" +[ $((RET & 32)) == 0 ] || cleanup "The second fsck reports it was cancelled" +[ $((RET & 128)) == 0 ] || cleanup "The second fsck reports a library error" + +[ -f $FAILED ] && cleanup + +if [ "$MOUNT_AFTER_CORRUPTION" = "yes" ]; then + mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT + [ $? -ne 0 ] && log "Unable to mount file system - skipped" + + [ "$REMOVE_FILES" = "yes" ] && rm -rf $MNTPT/* >> $OUT + umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT +fi + +rm -f $ARCHIVE + +# Report success +echo " ok" +echo "Succeeded..." > $OKFILE + +unset_vars + +COUNT=$((COUNT + 1)) +done Index: e2fsprogs-1.40.4/tests/Makefile.in =================================================================== --- e2fsprogs-1.40.4.orig/tests/Makefile.in +++ e2fsprogs-1.40.4/tests/Makefile.in @@ -24,6 +24,8 @@ test_script: test_script.in Makefile @chmod +x test_script check:: test_script + @echo "Removing remnants of earlier tests..." + $(RM) -f *~ *.log *.new *.failed *.ok test.img2* @echo "Running e2fsprogs test suite..." @echo " " @./test_script @@ -63,7 +65,7 @@ testend: test_script ${TDIR}/image @echo "If all is well, edit ${TDIR}/name and rename ${TDIR}." clean:: - $(RM) -f *~ *.log *.new *.failed *.ok test.img test_script + $(RM) -f *~ *.log *.new *.failed *.ok test.img* test_script distclean:: clean $(RM) -f Makefile Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.