From: Sander Eikelenboom Subject: Re: can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd Date: Thu, 5 Jan 2012 21:04:34 +0100 Message-ID: <193489064.20120105210434@eikelenboom.it> References: <217150909.20120105113759@eikelenboom.it> <197607646.20120105142107@eikelenboom.it> <6FC155DD-80C1-4088-B745-6B74D9D5AA48@mit.edu> <4910694144.20120105171428@eikelenboom.it> <20120105181535.GB26382@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, To: Ted Ts'o Return-path: In-Reply-To: <20120105181535.GB26382@thunk.org> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org Thursday, January 5, 2012, 7:15:35 PM, you wrote: > On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote: >>=20 >> OK spoke too soon, i have been able to trigger it again: >> - copying files from LV to the same LV without the snapshot went OK >> - copying from the RO snapshot of a LV to the same LV gave the error= while copying the file again: > OK. Originally, you said you did this: > 1) fsck -v -p -f the filesystem > 2) mount the filesystem > 3) Try to copy a file > 4) filesystem will be mounted RO on error (see below) > 5) fsck again, journal will be recovered, no other errors > 6) start at 1) > Was this with with a read-only snapshot always being in existence > through all of these five steps? When was the RO snapshot created? > If a RO snapshot has to be there in order for this to happen, then > this is almost certainly a device-mapper regression. (dm-devel folks= , > this is a problem which apparently occurred when the user went from > v3.1.5 to v3.2, so this looks likes 3.2 regression.) > - Ted Well it seems to consist of 2 issues with a kernel booted with a 3.2.0 = kernel: 1) - It only seems to trigger with a snapshot of the LV present - Just tested if the snapshot being mounted RO did really matter, it= doesn't. - It can also be triggerd if mounted RW - It can also be triggered when the snapshot is not mounted at all (= by just copying some files on the filesystem itself) So that seems a device mapper issue 2) BUT: after the error triggerd by 1: - After removing the snapshot with lvremove, - umounting the filesystem on the LV - fsck=EDng the filesystem without errors (apart from the journal re= covery) - rebooting the machine again with 3.2.0 kernel - mounting the filesystem on the LV - removing the partially copied files - trying to copy files from the filesystem on the LV to the same fil= esystem, without a snapshot of the LV present - it fails with the exact same error mounting the filesystem RO. then - umounting the filesystem on the LV - fsck=EDng the filesystem without errors (apart from the journal re= covery) - rebooting the machine with a 3.1.5 kernel - mounting the filesystem on the LV - removing the partially copied files - trying to copy files from the filesystem on the LV to the same fil= esystem, without a snapshot of the LV present - no problems files copied ok then - rebooting into 3.2.0 again - mounting the filesystem on the LV - removing the completly copied files - trying to copy files from the filesystem on the LV to the same fil= esystem, without a snapshot of the LV present - no problems files copied ok SO - it keeps on failing on 3.2.0, even when the snapshot is gone and t= he system is rebooted, after 3.1.5 is booted once everything seems to b= e OK again .... even under 3.2.0 - that seems more like a filesystem thing ? I doubled checked and performed all these steps again. -- Sander >>=20 >> [ 2357.655783] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:7= 39: group 1861, 32254 clusters in bitmap, 32258 in gd >> [ 2357.656056] Aborting journal on device dm-2-8. >> [ 2357.718473] EXT4-fs (dm-2): Remounting filesystem read-only >> [ 2357.736680] EXT4-fs error (device dm-2) in ext4_da_write_end:2532= : IO failure >> [ 2357.738328] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 7615 = pages, ino 4079617; err -30 >> [ 2716.125010] EXT4-fs error (device dm-2): ext4_put_super:818: Coul= dn't clean up the journal >>=20 >>=20 >> Attached are 4x output from dumpe2fs >> - dumpe2fs-xen_images-3.2.0 Made just afte= r boot >> - dumpe2fs-xen_images-3.2.0-afterfsck Made after doi= ng a fsck -v -p -f on the unmounted LV >> - dumpe2fs-xen_images-3.2.0-aftererror Made after the= error occured on the mounted LV >> - dumpe2fs-xen_images-3.2.0-aftererror-afterfsck Made after the= error occured, and after a subsequent fsck -v -p -f on the unmounted L= V >> - dumpe2fs-xen_images-3.1.5 Made after boo= ting into 3.1.5 after all of the above >>=20 >> Oh yes also did a badblock scan to rule that out, and it seems the n= umbers stay the same. >> e2fsck 1.41.12 (17-May-2010) (from debian squeeze) >>=20 >> -- >> Sander >>=20 >>=20 >>=20 >> >>=20 >> >> -- >> >> Sander >> >>=20 >> >>=20 >> >> This is a forwarded message >> >> From: Sander Eikelenboom >> >> To: "Theodore Ts'o" >> >> Date: Thursday, January 5, 2012, 11:37:59 AM >> >> Subject: can't recover ext4 on lvm from ext4_mb_generate_buddy:73= 9: group 1687, 32254 clusters in bitmap, 32258 in gd >> >>=20 >> >> =3D=3D=3D8<=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3DOriginal mes= sage text=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >>=20 >> >> I'm having some troubles with a ext4 filesystem on LVM, it seems = bricked and fsck doesn't seem to find and correct the problem. >> >>=20 >> >> Steps: >> >> 1) fsck -v -p -f the filesystem >> >> 2) mount the filesystem >> >> 3) Try to copy a file >> >> 4) filesystem will be mounted RO on error (see below) >> >> 5) fsck again, journal will be recovered, no other errors >> >> 6) start at 1) >> >>=20 >> >>=20 >> >> I think the way i bricked it is: >> >> - make a lvm snapshot from that lvm logical disk >> >> - mount that lvm snapshot as RO >> >> - try to copy a file from that mounted RO snapshot to a diffrent = dir on the lvm logical disk the snapshot is from. >> >> - it fails and i can't recover (see above) >> >>=20 >> >>=20 >> >> Is there a way to recover from this ? >> >>=20 >> >>=20 >> >>=20 >> >> [ 220.748928] EXT4-fs error (device dm-2): ext4_mb_generate_budd= y:739: group 1687, 32254 clusters in bitmap, 32258 in gd >> >> [ 220.749415] Aborting journal on device dm-2-8. >> >> [ 220.771633] EXT4-fs error (device dm-2): ext4_journal_start_sb= :327: Detected aborted journal >> >> [ 220.772593] EXT4-fs (dm-2): Remounting filesystem read-only >> >> [ 220.792455] EXT4-fs (dm-2): Remounting filesystem read-only >> >> [ 220.805118] EXT4-fs (dm-2): ext4_da_writepages: jbd2_start: 96= 80 pages, ino 4079617; err -30 >> >> serveerstertje:/mnt/xen_images/domains/production# cd / >> >> serveerstertje:/# umount /mnt/xen_images/ >> >> serveerstertje:/# fsck -f -v -p /dev/serveerstertje/xen_images >> >> fsck from util-linux-ng 2.17.2 >> >> /dev/mapper/serveerstertje-xen_images: recovering journal >> >>=20 >> >> 277 inodes used (0.00%) >> >> 5 non-contiguous files (1.8%) >> >> 0 non-contiguous directories (0.0%) >> >> # of inodes with ind/dind/tind blocks: 41/41/3 >> >> Extent depth histogram: 69/28/2 >> >> 51890920 blocks used (79.18%) >> >> 0 bad blocks >> >> 41 large files >> >>=20 >> >> 199 regular files >> >> 53 directories >> >> 0 character device files >> >> 0 block device files >> >> 0 fifos >> >> 0 links >> >> 16 symbolic links (16 fast symbolic links) >> >> 0 sockets >> >> -------- >> >> 268 files >> >> serveerstertje:/# >> >>=20 >> >>=20 >> >>=20 >> >>=20 >> >> System: >> >> - Kernel 3.2.0 >> >> - Debian Squeeze with: >> >> ii e2fslibs 1.41.12-4stable1 = ext2/ext3/ext4 file system libraries >> >> ii e2fsprogs 1.41.12-4stable1 = ext2/ext3/ext4 file system utilities >> >>=20 >> >> =3D=3D=3D8<=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3DEnd of original messa= ge text=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >>=20 >> >>=20 >> >>=20 >> >> --=20 >> >> Best regards, >> >> Sander mailto:linux@eikelenboom.it >>=20 >>=20 >>=20 >>=20 >> --=20 >> Best regards, >> Sander mailto:linux@eikelenboom.it --=20 Best regards, Sander mailto:linux@eikelenboom.it