Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752271AbaFLFfL (ORCPT ); Thu, 12 Jun 2014 01:35:11 -0400 Received: from mail-ve0-f181.google.com ([209.85.128.181]:34529 "EHLO mail-ve0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751391AbaFLFfJ (ORCPT ); Thu, 12 Jun 2014 01:35:09 -0400 MIME-Version: 1.0 Date: Wed, 11 Jun 2014 22:35:08 -0700 Message-ID: Subject: [Regression] 3.15 mmc related ext4 corruption with qemu-system-arm From: John Stultz To: Ulf Hansson , Chris Ball , Peter Maydell Cc: Johan Rudholm , Russell King - ARM Linux , "Theodore Ts'o" , lkml Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I've been seeing some ext4 corruption with recent kernels under qemu-system-arm. This issue seems to crop up after shutting down uncleanly (terminating qemu), shortly after booting about 50% of the time. ext4/mmc related dmesg details are: [ 3.206809] mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at 0x10005000 irq 41,42 (pio) [ 3.268316] mmc0: new SDHC card at address 4567 [ 3.281963] mmcblk0: mmc0:4567 QEMU! 2.00 GiB [ 3.315699] mmcblk0: p1 p2 p3 p4 < p5 p6 > ... [ 11.806169] EXT4-fs (mmcblk0p5): Ignoring removed nomblk_io_submit option [ 11.904714] EXT4-fs (mmcblk0p5): recovery complete [ 11.905854] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Opts: nomblk_io_submit,errors=panic ... [ 91.558824] EXT4-fs error (device mmcblk0p5): ext4_mb_generate_buddy:756: group 1, 2252 clusters in bitmap, 2284 in gd; block bitmap corrupt. [ 91.560641] Aborting journal on device mmcblk0p5-8. [ 91.562589] Kernel panic - not syncing: EXT4-fs (device mmcblk0p5): panic forced after error [ 91.562589] [ 91.563486] CPU: 0 PID: 1 Comm: init Not tainted 3.15.0-rc1 #560 [ 91.564616] [] (unwind_backtrace) from [] (show_stack+0x11/0x14) [ 91.565154] [] (show_stack) from [] (dump_stack+0x59/0x7c) [ 91.565666] [] (dump_stack) from [] (panic+0x67/0x178) [ 91.566147] [] (panic) from [] (ext4_handle_error+0x69/0x74) [ 91.566659] [] (ext4_handle_error) from [] (__ext4_grp_locked_error+0x6b/0x160) [ 91.567223] [] (__ext4_grp_locked_error) from [] (ext4_mb_generate_buddy+0x1b1/0x29c) [ 91.567860] [] (ext4_mb_generate_buddy) from [] (ext4_mb_init_cache+0x219/0x4e0) [ 91.568473] [] (ext4_mb_init_cache) from [] (ext4_mb_init_group+0xbb/0x138) [ 91.569021] [] (ext4_mb_init_group) from [] (ext4_mb_good_group+0xf3/0xfc) [ 91.569659] [] (ext4_mb_good_group) from [] (ext4_mb_regular_allocator+0x153/0x2c4) [ 91.570250] [] (ext4_mb_regular_allocator) from [] (ext4_mb_new_blocks+0x2fd/0x4e4) [ 91.570868] [] (ext4_mb_new_blocks) from [] (ext4_ext_map_blocks+0x965/0x10bc) [ 91.571444] [] (ext4_ext_map_blocks) from [] (ext4_map_blocks+0xfb/0x36c) [ 91.571992] [] (ext4_map_blocks) from [] (mpage_map_and_submit_extent+0x99/0x5f0) [ 91.572614] [] (mpage_map_and_submit_extent) from [] (ext4_writepages+0x2b9/0x4e8) [ 91.573201] [] (ext4_writepages) from [] (do_writepages+0x19/0x28) [ 91.573709] [] (do_writepages) from [] (__filemap_fdatawrite_range+0x3d/0x44) [ 91.574265] [] (__filemap_fdatawrite_range) from [] (filemap_flush+0x23/0x28) [ 91.574854] [] (filemap_flush) from [] (ext4_rename+0x2f9/0x3e4) [ 91.575360] [] (ext4_rename) from [] (vfs_rename+0x183/0x45c) [ 91.575911] [] (vfs_rename) from [] (SyS_renameat2+0x22b/0x26c) [ 91.576460] [] (SyS_renameat2) from [] (SyS_rename+0x1f/0x24) [ 91.576961] [] (SyS_rename) from [] (ret_fast_syscall+0x1/0x5c) Bisecting this points to: e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq before DATA irq). Which I guess shouldn't be surprising, as I saw problems with that patch earlier in the 3.15-rc cycle: https://lkml.org/lkml/2014/4/14/824 However that discussion petered out (possibly my fault for not following up) as to if it was an issue with the patch or a issue with qemu. Then the original issue disappeared for me, which I figured was due to a fix upstream, but now I'm guessing coincided with me updating my system and getting qemu v2.0 (where as previously I was on 1.5). $ qemu-system-arm -version QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.1), Copyright (c) 2003-2008 Fabrice Bellard While the previous behavior was annoying and kept my emulated environments from booting, this while a bit more rare and subtle eats the disks, which is much more painful for my testing. Unfortunately reverting the change (manually, as it doesn't revert cleanly anymore) doesn't seem to completely avoid the issue, so the bisection may have gone slightly astray (though it is interesting it landed on the same commit I earlier had trouble with). So I'll back-track and double check some of the last few "good" results to validate I didn't just luck into 3 good boots accidentally. I'll also review my revert in case I missed something subtle in doing it manually. Anyway, if there is any thoughts on how to better chase this down and debug it, I'd appreciate it! I can also provide reproduction instructions with a pre-built Linaro android disk image and hand built kernel if anyone wants to debug this themselves. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/