From: Sandon Van Ness <sandon@van-ness.com>
Subject: Re: linux-ext4@vger.kernel.org --- ext4 going read-only/journal abort
 when raid controller resets itself
Date: Sat, 24 Dec 2011 00:22:30 -0800
Message-ID: <4EF58BC6.9010302@van-ness.com>
References: <4EF58ABF.8020404@van-ness.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
To: linux-ext4@vger.kernel.org
In-Reply-To: <4EF58ABF.8020404@van-ness.com>
Sender: linux-ext4-owner@vger.kernel.org

Sorry I didn't initially send it in plaintext and had to start a new 
email and somehow messed up the subject it should have been:

ext4 going read-only/journal abort when raid controller resets itself

On 12/24/2011 12:18 AM, Sandon Van Ness wrote:
> Most of our machines are ext3 and have seen the card get reset on ext3 
> and it never went read-only like it always does in ext4 now. The I/O 
> goes unresponsive for a few minutes as it detects I/O is unresponsive 
> and then the controller is reset and the machine would recover (on 
> ext3/jfs, and other fs's) on ext4 the journal is aborted and it goes 
> into read-only:
>
> Anyone ever see something like this?
>
>
> [605458.429395] scsi cmnd aborted, scsi_cmnd(0xffff88041c4dac80), 
> cmnd[0x28,0x 0,0xcb,0x 4,0x13,0x40,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], 
> scsi_id = 0x 0, scsi_lun = 0x 1.
> [605458.444011] arcmsr1: executing eh bus reset .....num_resets = 0, 
> num_aborts = 114
> [605458.451724] arcmsr1: executing hw bus reset .....
> [605480.472827] arcmsr1: waiting for hw bus reset return, retry=1
> [605500.486408] arcmsr1: waiting for hw bus reset return, retry=2
> [605520.516017] Areca RAID Controller1: F/W V1.49 2010-12-02 & Model 
> ARC-1222
> [606900.929121] EXT4-fs error (device sdb1): mb_free_blocks:1397: 
> group 39137block 1282445473:freeing already freed block (bit 4257)
> [606900.941415] Aborting journal on device sdb1-8.
> [606900.941561] EXT4-fs error (device sdb1) in ext4_setattr:5462: 
> Readonly filesystem
> [606900.955051] Aborting journal on device sdb1-8.
>
> Seconds after the card is reset and recovers the journal is aborted 
> and read-only. Here is another case where it happens even before the 
> card is reset:
>
> [574763.342694] scsi cmnd aborted, scsi_cmnd(0xffff8803d5fff1c0), 
> cmnd[0x2a,0x 0,0x 0,0x 0,0x a,0xd0,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], 
> scsi_id = 0x 0, scsi_lun = 0x 1.
> [574763.357267] scsi cmnd aborted, scsi_cmnd(0xffff8800712a1480), 
> cmnd[0x2a,0x 0,0x 0,0x 0,0x a,0xf0,0x 0,0x 0,0x 8,0x 0,0x 0,0x 0], 
> scsi_id = 0x 0, scsi_lun = 0x 1.
> --------------------------SNIP----------------------------------------
> [584376.272002] scsi cmnd aborted, scsi_cmnd(0xffff8802407f63c0), 
> cmnd[0x88,0x 0,0x 0,0x 0,0x 0,0x 2,0x62,0x44,0xe7,0x28,0x 0,0x 0], 
> scsi_id = 0x 0, scsi_lun = 0x 1.
> [584376.286524] arcmsr1: executing eh bus reset .....num_resets = 2, 
> num_aborts = 497
> [584376.612598] arcmsr1: wait 'abort all outstanding command' timeout
> [587971.898239] EXT4-fs error (device sdb1): 
> ext4_mb_generate_buddy:731: group 4216017413 blocks in bitmap, 17416 
> in gd
> [587971.908788] Aborting journal on device sdb1-8.
> [587972.072416] EXT4-fs (sdb1): Remounting filesystem read-only
> [587972.072513] EXT4-fs error (device sdb1): 
> ext4_journal_start_sb:260: Detected aborted journal
> [587972.072518] EXT4-fs (sdb1): Remounting filesystem read-only
> [587972.092489] EXT4-fs error (device sdb1) in 
> ext4_reserve_inode_write:5657: Journal has aborted
> [587974.432255] EXT4-fs error (device sdb1) in ext4_evict_inode:210: 
> Journal has aborted
> [587974.443945] EXT4-fs (sdb1): ext4_da_writepages: jbd2_start: 778 
> pages, ino 76182150; err -30
> [587974.446615] EXT4-fs (sdb1): ext4_da_writepages: jbd2_start: 
> 9223372036854775773 pages, ino 88873701; err -30
>
> Any ideas why ext4 has this behavior when ext3 did not?
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>