2005-04-27 11:47:13

by Philipp Matthias Hahn

[permalink] [raw]
Subject: RFH: ext3 on EVMS on SW-RAID1 problem

Hello and help!

One of our university fileservers shows strange problems since last
friday. Syslog show the following messages:
attempt to access beyond end of device
dm-8: rw=0, want=8589934592, limit=262142
The strange thing: If I mount a disk-image of that volume via loop,
everything works fine!

The server was running Debian sarge with an unpatched 2.6.11.6 than, but
is running an 2.6.11.7 now and still shows the same problem.
EVMS is version 2.5.2-1 and DevMapper is version 1.01.00-4.

moradin:/var/tmp# dd if=/dev/evms/bsp2005 of=/var/tmp/bsp2005.e3
262142+0 records in
262142+0 records out
134216704 bytes transferred in 5.012082 seconds (26778633 bytes/sec)
moradin:/var/tmp# mount -o loop /var/tmp/bsp2005.e3 /mnt
moradin:/var/tmp# stat /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h
File: `/mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h'
Size: 12695 Blocks: 28 IO Block: 4096 regular file
Device: 700h/1792d Inode: 24840 Links: 1
Access: (0640/-rw-r-----) Uid: ( 1000/ pmhahn) Gid: (19992/ bsp2005)
Access: 2005-04-27 13:06:16.000000000 +0200
Modify: 2005-04-13 15:21:57.000000000 +0200
Change: 2005-04-22 08:35:09.000000000 +0200
moradin:/var/tmp# md5sum /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h
3a5f8185367677ce39f9f8d2a72a2705 /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h
moradin:~# umount /mnt

moradin:~# mount /dev/evms/bsp2005 /mnt
moradin:~# stat /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h
File: `/mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h'
Size: 12695 Blocks: 28 IO Block: 4096 regular file
Device: fd08h/64776d Inode: 24840 Links: 1
Access: (0640/-rw-r-----) Uid: ( 1000/ pmhahn) Gid: (19992/ bsp2005)
Access: 2005-04-27 13:06:16.000000000 +0200
Modify: 2005-04-13 15:21:57.000000000 +0200
Change: 2005-04-22 08:35:09.000000000 +0200
moradin:~# md5sum /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h
error processing /mnt/i386-gnu-linux/tools/lib/gcc-lib/mips-linux/3.3.2/include/stddef.h: failed in buffer_read(fd): mdfile: Input/output error
moradin:~# umount /mnt

bsp2005 is an ext3-filesystem, from which a snapshot bsp2005_snap is
created. They both live in a lvm-region, which is based on a
Software-RAID1 using two partitions of two SCSI discs:
lvm/svs/bsp2005#origin#
lvm/svs/bsp2005
md/md0
sda4
sda
sdb4
sdb

Is something wrong with this setup or is it a known problem? Since the
same solution was working last year without problems, I'm very confused
about this strange error behaviour.

BYtE
Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]


2005-04-28 03:09:03

by Chris Adams

[permalink] [raw]
Subject: Re: RFH: ext3 on EVMS on SW-RAID1 problem

Once upon a time, Philipp Matthias Hahn <[email protected]> said:
>One of our university fileservers shows strange problems since last
>friday. Syslog show the following messages:
> attempt to access beyond end of device
> dm-8: rw=0, want=8589934592, limit=262142
>The strange thing: If I mount a disk-image of that volume via loop,
>everything works fine!
>
>The server was running Debian sarge with an unpatched 2.6.11.6 than, but
>is running an 2.6.11.7 now and still shows the same problem.
>EVMS is version 2.5.2-1 and DevMapper is version 1.01.00-4.

I see a similar problem under recent Fedora Core 3 kernels with LVM2.
It appears when I create a snapshot of a volume. See Red Hat's
Bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152162

Exact steps I used to reproduce the problem (which also results in file
corruption, even when reading from the non-snapshot volume). I used a
scratch partition, /dev/sda8:

########################################################################
# create the software RAID as a 2 device mirror with 1 missing
mdadm -C -l 1 -n 2 /dev/md0 /dev/sda8 missing

# create the LVM setup
pvcreate /dev/md0
vgcreate lvtest /dev/md0
lvcreate -L100m -n test lvtest

# make a filesystem and put some data on it
mke2fs -j /dev/lvtest/test
mount /dev/lvtest/test /mnt
cp --preserve=all -r /boot/* /mnt/
umount /mnt
blockdev --flushbufs /dev/lvtest/test

# now mount it, create a snapshot, and see the result
mount /dev/lvtest/test /mnt
lvcreate -s -L10m -n snap /dev/lvtest/test
diff -ur /boot /mnt
########################################################################

The output I got from diff was:

diff: /mnt/System.map-2.6.10-1.766_FC3: Input/output error

and I got a bunch of messages like:

attempt to access beyond end of device
dm-4: rw=0, want=8300006146, limit=204800
Buffer I/O error on device dm-4, logical block 4150003072

from the kernel. These only seem to appear sometimes - other times I
get file corruption (although the corruption appears to be
block-aligned).

If I then do:

########################################################################
lvremove /dev/lvtest/snap
umount /mnt
blockdev --flushbufs /dev/lvtest/test
mount /dev/lvtest/test /mnt
diff -ur /boot /mnt
########################################################################

It compares with no errors.

--
Chris Adams <[email protected]>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.