Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758393AbXFSJYj (ORCPT ); Tue, 19 Jun 2007 05:24:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754727AbXFSJY3 (ORCPT ); Tue, 19 Jun 2007 05:24:29 -0400 Received: from s2.ukfsn.org ([217.158.120.143]:43232 "EHLO mail.ukfsn.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754145AbXFSJY1 (ORCPT ); Tue, 19 Jun 2007 05:24:27 -0400 Message-ID: <4677A0C7.4000306@dgreaves.com> Date: Tue, 19 Jun 2007 10:24:23 +0100 From: David Greaves User-Agent: Mozilla-Thunderbird 2.0.0.0 (X11/20070601) MIME-Version: 1.0 To: David Chinner , Tejun Heo Cc: David Robinson , LVM general discussion and development , "'linux-kernel@vger.kernel.org'" , xfs@oss.sgi.com, linux-pm , LinuxRaid , "Rafael J. Wysocki" Subject: Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume References: <46744065.6060605@dgreaves.com> <4674645F.5000906@gmail.com> <46751D37.5020608@dgreaves.com> <4676390E.6010202@dgreaves.com> <20070618145007.GE85884050@sgi.com> <4676D97E.4000403@dgreaves.com> In-Reply-To: <4676D97E.4000403@dgreaves.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6155 Lines: 174 David Greaves wrote: > I'm going to have to do some more testing... done > David Chinner wrote: >> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote: >>> David Greaves wrote: >>> So doing: >>> xfs_freeze -f /scratch >>> sync >>> echo platform > /sys/power/disk >>> echo disk > /sys/power/state >>> # resume >>> xfs_freeze -u /scratch >>> >>> Works (for now - more usage testing tonight) >> >> Verrry interesting. > Good :) Now, not so good :) >> What you were seeing was an XFS shutdown occurring because the free space >> btree was corrupted. IOWs, the process of suspend/resume has resulted >> in either bad data being written to disk, the correct data not being >> written to disk or the cached block being corrupted in memory. > That's the kind of thing I was suspecting, yes. > >> If you run xfs_check on the filesystem after it has shut down after a >> resume, >> can you tell us if it reports on-disk corruption? Note: do not run >> xfs_repair >> to check this - it does not check the free space btrees; instead it >> simply >> rebuilds them from scratch. If xfs_check reports an error, then run >> xfs_repair >> to fix it up. > OK, I can try this tonight... This is on 2.6.22-rc5 So I hibernated last night and resumed this morning. Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave) Here are some photos of the screen during resume. This is not 100% reproducable - it seems to occur only if the system is shutdown for 30mins or so. Tejun, I wonder if error handling during resume is problematic? I got the same errors in 2.6.21. I have never seen these (or any other libata) errors other than during resume. http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg (hard to read, here's one from 2.6.21 http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg I _think_ I've only seen the xfs problem when a resume shows these errors. Ok, to try and cause a problem I ran a make and got this back at once: make: stat: Makefile: Input/output error make: stat: clean: Input/output error make: *** No rule to make target `clean'. Stop. make: stat: GNUmakefile: Input/output error make: stat: makefile: Input/output error I caught the first dmesg this time: Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file fs/xfs/xfs_btree.c. Caller 0xc01b58e1 [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x20 [] dump_stack+0x15/0x20 [] xfs_error_report+0x4f/0x60 [] xfs_btree_check_sblock+0x56/0xd0 [] xfs_alloc_lookup+0x181/0x390 [] xfs_alloc_lookup_le+0x16/0x20 [] xfs_free_ag_extent+0x51/0x690 [] xfs_free_extent+0xa4/0xc0 [] xfs_bmap_finish+0x119/0x170 [] xfs_itruncate_finish+0x23a/0x3a0 [] xfs_inactive+0x482/0x500 [] xfs_fs_clear_inode+0x34/0xa0 [] clear_inode+0x57/0xe0 [] generic_delete_inode+0xe5/0x110 [] generic_drop_inode+0x167/0x1b0 [] iput+0x5f/0x70 [] do_unlinkat+0xdf/0x140 [] sys_unlink+0x10/0x20 [] syscall_call+0x7/0xb ======================= xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. Return address = 0xc021101e Filesystem "dm-0": Corruption of in-memory data detected. Shutting down filesystem: dm-0 Please umount the filesystem, and rectify the problem(s) so I cd'ed out of /scratch and umounted. I then tried the xfs_check. haze:~# xfs_check /dev/video_vg/video_lv ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_check. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. haze:~# mount /scratch/ haze:~# umount /scratch/ haze:~# xfs_check /dev/video_vg/video_lv Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Bad page state in process 'xfs_db' Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0 Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Trying to fix it up, but a reboot is needed Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Backtrace: Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Bad page state in process 'syslogd' Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0 Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Trying to fix it up, but a reboot is needed Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ... haze kernel: Backtrace: ugh. Try again haze:~# xfs_check /dev/video_vg/video_lv haze:~# whilst running a top reported this as roughly the peak memory usage: 8759 root 18 0 479m 474m 876 R 2.0 46.9 0:02.49 xfs_db so it looks like it didn't run out of memory (machine has 1Gb). Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 compressed output with no sign of stopping... still got it if it's any use... lots of: setting block 0/0 to sb setting block 0/1 to freelist setting block 0/2 to freelist setting block 0/3 to freelist setting block 0/4 to freelist setting block 0/75 to btbno setting block 0/346901 to free1 setting block 0/346903 to free1 setting block 0/346904 to free1 setting block 0/346905 to free1 and stuff like this inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096 inode 128 nlink 39 is dir inode 128 extent [0,7,1,0] I then rebooted and ran a repair which didn't show any damage. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/