2012-06-06 05:24:17

by Ming Lei

[permalink] [raw]
Subject: ext4 corruption during unexpected power cycle in the middle of writing

I ran the power cycle test during the middle of file writing and after bootup, I ran force fsck and found two errors (If I run fsck -p -v I don't see the errors). From what I saw I think it is file system meta data corruption. Fsck can repair it but each time I ran the same test and I hit the same issue.

I don't think it is relevant but want to point out that sda6 shares the same drive as another partition on sda(sda3) is used for the raid6 array for /var.

The same issue was found whenever barrier is on or off, and the disk drive write cache is enabled or disabled. The test result shown below is when barrier is on and disk write cache is disabled.

I use kernel version 2.6.32SL6 version. I also see the same issue on 2.6.9 based kernel on the same hardware with ext3 file system.

My question is:
1) Is the issue caused from something unique in my box? Configuration error?
2) Is it possible my version of fsck reported false errors?
3) Is this a known issue? ? Is it a kernel bug?
4) How do I find what's wrong?
Thanks.

[admin@chief-csa81 ~]# uname -a
Linux chief-csa81 2.6.32 #1 SMP Tue May 29 00:36:37 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[admin@chief-csa81 ~]# mount
/dev/md1 on /var type ext3 (rw,relatime,user_xattr,acl,barrier=1,data=ordered)
/dev/sda6 on /data type ext4 (rw,nodiratime,relatime,barrier=1,data=ordered)
[admin@chief-csa81 ~]# sdparm /dev/sda
??? /dev/sda: ATA?????? WDC WD5003ABYX-0? 1S01
Read write error recovery mode page:
? AWRE??????? 1? [cha: n]
? ARRE??????? 0? [cha: n]
? PER? ???????0? [cha: n]
Caching (SBC) mode page:
? WCE???????? 0? [cha: y]
? RCD???????? 0? [cha: n]
Control mode page:
? SWP???????? 0? [cha: n]
Informational exceptions control mode page:
? EWASC?????? 0? [cha: n]
? DEXCPT????? 0? [cha: n]
? MRIE??????? 6? [cha: n]
[admin@chief-csa81 ~]# cbt -p /data -x 1337 -s 400GB
---------------------------------------------------
Build info
?? Version: v2.2.5
?? Time: May 29 2012 01:51:41
Runtime info
?? Sysname:? Linux
?? Nodename: chief-csa81
?? Release:? 2.6.32
?? Version:? #1 SMP Tue May 29 00:36:37 PDT 2012
?? Machine:? x86_64
?? Time: Tue Jun? 5 15:02:55 2012
Invocation
??????? cbt -p /data -x 1337 -s 400GB
Configuration
? General Options:
????? -r,--report-after-bytes?? = 500MB
????? -R,--report-after-seconds = 0
? Output Size/Dimensions:
????? -p,--output-path????????? = /data
????? -s,--output-size????????? = 400GB
????? -n,--max-streams????????? = 1
????? -S,--output-file-size???? = 18GB
????? -w,--write-size?????????? =
???????? --oow-max-distance???? =
? Data Options:
???????? --data-mode??????????? = s
????? -x,--random-seed????????? = 1337
????? -z,--lz-factor??????????? = 3.5425
????? -D,--ddup-factor????????? = 1.029
????? -c,--churn-pct??????????? = 0
????? -X,--churn-pct-rand-seed? = 1338933774
????? -A,--ddup-fanout-algorithm = CLUSTERED_RUNS
???????? --ddup-mean-run-size-in-segments = 1.3
???????? --ddup-mean-unique-run-size-in-segments = 1200
???????? --ddup-run-size-multiple = 64
???????? --ddup-min-distance-from-last-ddup = 120KB
???????? --ddup-min-distance-from-cursor = 160MB
???????? --ddup-ave-dist-for-nearby-segs = 120KB
???????? --ddup-random-jump-every-n-ddup-runs = 12.6344
? Segmentation Options:
???????? --type???????????????? = rk
???????? --rk-seg-size-min????? = 512
???????? --rk-seg-size-max????? = 32KB
???????? --rk-seg-size-ave????? = 8KB
???????? --rk-seg-size-dist???? = exponential
??? Number of cut points (-C) being used: 3045800
---------------------------------------------------
Number of files to write: 23

New_file: 0, stream 0,??? /data/cbt_output.0, size: 18GB
? Total: 2.284 sec (0:00:02), 500.001 MB, 218.926 MBps - Curr: 2.284 sec, 500.001 MB, 218.926 MBps - Tue Jun? 5 15:02:57 2012
? Total: 4.822 sec (0:00:04), 1000.002 MB, 207.397 MBps - Curr: 2.538 sec, 500.001 MB, 197.022 MBps - Tue Jun? 5 15:03:00 2012
? Total: 8.275 sec (0:00:08), 1500.004 MB, 181.263 MBps - Curr: 3.454 sec, 500.002 MB, 144.776 MBps - Tue Jun? 5 15:03:03 2012
====================== ?Power cycle during the writing ============================

========== on next boot, first fsck found 2 errors and then next fsck passed ===============================
[admin@chief-csa81 ~]# fsck.ext4 -v -f -y /dev/sda6
e2fsck 1.42 (29-Nov-2011)
/dev/sda6: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (118366120, counted=76269471).
Fix? yes

Free inodes count wrong (30081013, counted=30081004).
Fix? yes

/dev/sda6: ***** FILE SYSTEM WAS MODIFIED *****

????? 20 inodes used (0.00%)
????? 0 non-contiguous files (0.0%)
?????? 0 non-contiguous directories (0.0%)
???????? # of inodes with ind/dind/tind blocks: 0/0/0
???????? Extent depth histogram: 3/9
44035281 blocks used (36.60%)
?????? 0 bad blocks
????? 10 large files

?????? 9 regular files
?????? 2 directories
?????? 0 character device files
?????? 0 block device files
?????? 0 fifos
?????? 0 links
?????? 0 symbolic links (0 fast symbolic links)
?????? 0 sockets
--------
????? 11 files

[admin@chief-csa81 ~]# fsck.ext4 -v -f -y /dev/sda6
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

????? 20 inodes used (0.00%)
?????? 0 non-contiguous files (0.0%)
?????? 0 non-contiguous directories (0.0%)
???????? # of inodes with ind/dind/tind blocks: 0/0/0
???????? Extent depth histogram: 3/9
44035281 blocks used (36.60%)
?????? 0 bad blocks
????? 10 large files

?????? 9 regular files
?????? 2 directories
?????? 0 character device files
?????? 0 block device files
?????? 0 fifos
?????? 0 links
?????? 0 symbolic links (0 fast symbolic links)
?????? 0 sockets
--------
????? 11 files


====== Hardware Info =====================================
The box has 2 Quad-Core AMD Opteron(tm) Processor 2378 stepping 02 and LSI sata controller.

Fusion MPT base driver 4.28.00.00
Copyright (c) 1999-2010 LSI Corporation
Fusion MPT SPI Host driver 4.28.00.00
Fusion MPT SAS Host driver 4.28.00.00
ACPI: PCI Interrupt Link [LNEC] enabled at IRQ 17
mptsas 0000:02:00.0: PCI INT A -> Link[LNEC] -> GSI 17 (level, low) -> IRQ 17
mptbase: ioc0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total memory = 8114680 kB
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068E B2: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1068E B2, FwRev=011f0000h, Ports=1, MaxQ=277, IRQ=17
mptsas: ioc0: add expander: num_phys 12, sas_addr (0x3000000403f)
mptsas: ioc0: add expander: num_phys 12, sas_addr (0x3000000427f)
mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 37, phy 0, sas_addr 0x30000004030
scsi target0:0:0: mptsas: ioc0: add device: fw_channel 0, fw_id 37, phy 0, sas_addr 0x30000004030scsi 0:0:0:0: Direct-Access???? ATA????? WDC WD5003ABYX-0 1S01 PQ: 0 ANSI: 5
scsi 0:0:0:0: mptscsih: ioc0: qdepth=64, tagged=1, simple=1, ordered=0, scsi_level=6, cmd_que=1
sdev dma_alignment 511
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:0:0: [sda] Attached SCSI disk


2012-06-06 05:31:36

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4 corruption during unexpected power cycle in the middle of writing

On 6/6/12 12:24 AM, Ming Lei wrote:
> I ran the power cycle test during the middle of file writing and after bootup, I ran force fsck and found two errors (If I run fsck -p -v I don't see the errors). From what I saw I think it is file system meta data corruption. Fsck can repair it but each time I ran the same test and I hit the same issue.
>
> I don't think it is relevant but want to point out that sda6 shares the same drive as another partition on sda(sda3) is used for the raid6 array for /var.
>
> The same issue was found whenever barrier is on or off, and the disk drive write cache is enabled or disabled. The test result shown below is when barrier is on and disk write cache is disabled.
>
> I use kernel version 2.6.32SL6 version. I also see the same issue on 2.6.9 based kernel on the same hardware with ext3 file system.
>
> My question is:
> 1) Is the issue caused from something unique in my box? Configuration error?
> 2) Is it possible my version of fsck reported false errors?

Sort of. You got:

> Free blocks count wrong (118366120, counted=76269471).
> Fix? yes
>
> Free inodes count wrong (30081013, counted=30081004).
> Fix? yes

Those are the superblock counters, which aren't journaled - only the bg counters are logged via the journal, IIRC.

They aren't false... they are just expected due to the design I'm afraid.

If you had mounted/unmounted/fsck'd you wouldn't have seen errors, because at mount time the superblock gets updated from all of the individual bg counters in ext4_fill_super:

/*
* The journal may have updated the bg summary counts, so we
* need to update the global counters.
*/

> 3) Is this a known issue? ? Is it a kernel bug?

yes. Not really. ;)

> 4) How do I find what's wrong?

I think this is by design, though maybe a little unfortunate in that it is unexpected to get fsck errors on a journaling filesystem after a crash...

I ran into this same thing when doing recovery testing for > 16T filesystems.

-Eric


2012-06-06 05:44:48

by Ming Lei

[permalink] [raw]
Subject: RE: ext4 corruption during unexpected power cycle in the middle of writing

Is this behavior documented somewhere?

-----Original Message-----
From: Eric Sandeen [mailto:[email protected]]
Sent: Tuesday, June 05, 2012 10:32 PM
To: Ming Lei
Cc: [email protected]
Subject: Re: ext4 corruption during unexpected power cycle in the middle of writing

On 6/6/12 12:24 AM, Ming Lei wrote:
> I ran the power cycle test during the middle of file writing and after bootup, I ran force fsck and found two errors (If I run fsck -p -v I don't see the errors). From what I saw I think it is file system meta data corruption. Fsck can repair it but each time I ran the same test and I hit the same issue.
>
> I don't think it is relevant but want to point out that sda6 shares the same drive as another partition on sda(sda3) is used for the raid6 array for /var.
>
> The same issue was found whenever barrier is on or off, and the disk drive write cache is enabled or disabled. The test result shown below is when barrier is on and disk write cache is disabled.
>
> I use kernel version 2.6.32SL6 version. I also see the same issue on 2.6.9 based kernel on the same hardware with ext3 file system.
>
> My question is:
> 1) Is the issue caused from something unique in my box? Configuration error?
> 2) Is it possible my version of fsck reported false errors?

Sort of. You got:

> Free blocks count wrong (118366120, counted=76269471).
> Fix? yes
>
> Free inodes count wrong (30081013, counted=30081004).
> Fix? yes

Those are the superblock counters, which aren't journaled - only the bg counters are logged via the journal, IIRC.

They aren't false... they are just expected due to the design I'm afraid.

If you had mounted/unmounted/fsck'd you wouldn't have seen errors, because at mount time the superblock gets updated from all of the individual bg counters in ext4_fill_super:

/*
* The journal may have updated the bg summary counts, so we
* need to update the global counters.
*/

> 3) Is this a known issue? ? Is it a kernel bug?

yes. Not really. ;)

> 4) How do I find what's wrong?

I think this is by design, though maybe a little unfortunate in that it is unexpected to get fsck errors on a journaling filesystem after a crash...

I ran into this same thing when doing recovery testing for > 16T filesystems.

-Eric


2012-06-06 18:55:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 corruption during unexpected power cycle in the middle of writing

On Wed, Jun 06, 2012 at 12:31:34AM -0500, Eric Sandeen wrote:
> Sort of. You got:
>
> > Free blocks count wrong (118366120, counted=76269471).
> > Fix? yes
> >
> > Free inodes count wrong (30081013, counted=30081004).
> > Fix? yes
>
> Those are the superblock counters, which aren't journaled - only the bg counters are logged via the journal, IIRC.

This has been partially addressed in e2fsprogs 1.42.3:

commit 2788cc879bbe667d28277e1d660b7e56514e5b30
Author: Andreas Dilger <[email protected]>
Date: Tue Apr 24 16:22:48 2012 -0400

e2fsck: quiet harmless inode/blocks errors

Don't consider only an error in the superblock summary as incorrect.
The kernel does not update this field except at unmount time, so
don't print errors during a "-n" run if there is nothing else wrong.
Any other unfixed errors will themselves mark the filesystem invalid.

Signed-off-by: Andreas Dilger <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

So e2fsck -n will silence the warnings, and e2fsck -p/-y will just fix
them automatically, although they are still getting printed. Maybe I
should just mark these messages as one where they just print the fact
that we are adjusting the free blocks count, and avoid using the word
"wrong" --- at least if the filesystem has a journal --- so the error
message isn't as scary/misleading.

- Ted