2005-05-26 09:43:27

by Olivier Guerrier

[permalink] [raw]
Subject: Fake ext3 corruption on raid5 in 2.6 .9 smp

Linux version 2.6.11.9-data1-20050520 (root@bimp) (gcc version 3.3.5 (Debian 1:3.3.5-12)) #1 SMP Fri May 20 17:31:52 UTC 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ffec000 (usable)
BIOS-e820: 000000007ffec000 - 000000007ffef000 (ACPI data)
BIOS-e820: 000000007ffef000 - 000000007ffff000 (reserved)
BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f7e90
On node 0 totalpages: 524268
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 225280 pages, LIFO batch:16
HighMem zone: 294892 pages, LIFO batch:16
DMI 2.3 present.
ACPI: RSDP (v000 ASUS ) @ 0x000f85e0
ACPI: RSDT (v001 ASUS A7M266-D 0x30303031 MSFT 0x31313031) @ 0x7ffec000
ACPI: FADT (v001 ASUS A7M266-D 0x30303031 MSFT 0x31313031) @ 0x7ffec100
ACPI: BOOT (v001 ASUS A7M266-D 0x30303031 MSFT 0x31313031) @ 0x7ffec040
ACPI: MADT (v001 ASUS A7M266-D 0x30303031 MSFT 0x31313031) @ 0x7ffec080
ACPI: DSDT (v001 ASUS A7M266-D 0x00001000 MSFT 0x0100000b) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:6 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 6:6 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
Using ACPI for processor (LAPIC) configuration information
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: ASUS Product ID: PROD00000000 APIC at: 0xFEE00000
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode: Flat. Using 1 I/O APICs
Processors: 2
Allocating PCI resources starting at 80000000 (gap: 80000000:7ec00000)
Built 1 zonelists
Kernel command line: root=/dev/md0
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 1599.881 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2075420k/2097072k available (2055k kernel code, 20868k reserved, 566k data, 216k init, 1179568k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 3137.53 BogoMIPS (lpj=1568768)
Security Framework v1.0.0 initialized
SELinux: Disabled at boot.
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff c1cbfbff 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000020 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: AMD Athlon(TM) MP 1900+ stepping 02
per-CPU timeslice cutoff: 731.65 usecs.
task migration cache decay timeout: 1 msecs.
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay loop... 3194.88 BogoMIPS (lpj=1597440)
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff c1cbfbff 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000020 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Athlon(TM) MP 1900+ stepping 02
Total of 2 processors activated (6332.41 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=0
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
CPU0 attaching sched-domain:
domain 0: span 3
groups: 1 2
CPU1 attaching sched-domain:
domain 0: span 3
groups: 2 1
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf1f20, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
Linux Plug and Play Support v0.97 (c) Adam Belay
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc5e0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc610, dseg 0xf0000
PnPBIOS: 14 nodes reported by PnP BIOS; 14 recorded by driver
SCSI subsystem initialized
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router AMD768 [1022/7443] at 0000:00:07.3
PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:02:00.0[A] -> IRQ 17
PCI->APIC IRQ transform: 0000:02:01.0[A] -> IRQ 18
PCI->APIC IRQ transform: 0000:02:02.0[A] -> IRQ 19
PCI->APIC IRQ transform: 0000:02:03.0[A] -> IRQ 16
PCI->APIC IRQ transform: 0000:03:00.0[D] -> IRQ 19
PCI->APIC IRQ transform: 0000:03:04.0[A] -> IRQ 17
PCI->APIC IRQ transform: 0000:03:05.0[A] -> IRQ 18
PCI->APIC IRQ transform: 0000:03:06.0[A] -> IRQ 17
pnp: 00:0f: ioport range 0xe400-0xe47f has been reserved
pnp: 00:0f: ioport range 0xe4e0-0xe4ff has been reserved
Simple Boot Flag at 0x3a set to 0x1
highmem bounce pool size: 64 pages
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch ([email protected])
devfs: boot_options: 0x0
Initializing Cryptographic API
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 48 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
ide0: BM-DMA at 0xb800-0xb807, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xb808-0xb80f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: Maxtor 6Y080L0, ATA DISK drive
hdb: DVD+RW RW5240, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: Maxtor 6Y080L0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
SiI680: IDE controller at PCI slot 0000:03:05.0
SiI680: chipset revision 2
SiI680: BASE CLOCK == 133
SiI680: 100% native mode on irq 18
ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
hdf: Maxtor 6Y160P0, ATA DISK drive
ide2 at 0xf880e080-0xf880e087,0xf880e08a on irq 18
Probing IDE interface ide3...
hdh: Maxtor 6Y160P0, ATA DISK drive
ide3 at 0xf880e0c0-0xf880e0c7,0xf880e0ca on irq 18
SiI680: IDE controller at PCI slot 0000:03:06.0
SiI680: chipset revision 2
SiI680: BASE CLOCK == 133
SiI680: 100% native mode on irq 17
ide4: MMIO-DMA , BIOS settings: hdi:pio, hdj:pio
ide5: MMIO-DMA , BIOS settings: hdk:pio, hdl:pio
Probing IDE interface ide4...
hdj: Maxtor 6Y160P0, ATA DISK drive
ide4 at 0xf8810080-0xf8810087,0xf881008a on irq 17
Probing IDE interface ide5...
hdl: Maxtor 6Y160P0, ATA DISK drive
ide5 at 0xf88100c0-0xf88100c7,0xf88100ca on irq 17
hda: max request size: 128KiB
hda: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
/dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
hdc: max request size: 128KiB
hdc: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes supported
/dev/ide/host0/bus1/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
hdf: max request size: 64KiB
hdf: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(133)
hdf: cache flushes supported
/dev/ide/host2/bus0/target1/lun0: p1
hdh: max request size: 64KiB
hdh: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(133)
hdh: cache flushes supported
/dev/ide/host2/bus1/target1/lun0: p1
hdj: max request size: 64KiB
hdj: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(133)
hdj: cache flushes supported
/dev/ide/host4/bus0/target1/lun0: p1
hdl: max request size: 64KiB
hdl: 320173056 sectors (163928 MB) w/7936KiB Cache, CHS=19929/255/63, UDMA(133)
hdl: cache flushes supported
/dev/ide/host4/bus1/target1/lun0: p1
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: automatically using best checksumming function: pIII_sse
pIII_sse : 4300.000 MB/sec
raid5: using function: pIII_sse (4300.000 MB/sec)
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 7, 786432 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
NET: Registered protocol family 8
NET: Registered protocol family 20
Starting balanced_irq
devfs_mk_dev: could not append to parent for md/0
md: Autodetecting RAID arrays.
md: autorun ...
md: considering hdl1 ...
md: adding hdl1 ...
md: adding hdj1 ...
md: adding hdh1 ...
md: adding hdf1 ...
md: hdc8 has different UUID to hdl1
md: hdc7 has different UUID to hdl1
md: hdc6 has different UUID to hdl1
md: hdc5 has different UUID to hdl1
md: hdc3 has different UUID to hdl1
md: hdc2 has different UUID to hdl1
md: hdc1 has different UUID to hdl1
md: hda8 has different UUID to hdl1
md: hda7 has different UUID to hdl1
md: hda6 has different UUID to hdl1
md: hda5 has different UUID to hdl1
md: hda3 has different UUID to hdl1
md: hda2 has different UUID to hdl1
md: hda1 has different UUID to hdl1
devfs_mk_dev: could not append to parent for md/9
md: created md9
md: bind<hdf1>
md: bind<hdh1>
md: bind<hdj1>
md: bind<hdl1>
md: running: <hdl1><hdj1><hdh1><hdf1>
raid5: device hdl1 operational as raid disk 3
raid5: device hdj1 operational as raid disk 2
raid5: device hdh1 operational as raid disk 1
raid5: device hdf1 operational as raid disk 0
raid5: allocated 4203kB for md9
raid5: raid level 5 set md9 active with 4 out of 4 devices, algorithm 2
RAID5 conf printout:
--- rd:4 wd:4 fd:0
disk 0, o:1, dev:hdf1
disk 1, o:1, dev:hdh1
disk 2, o:1, dev:hdj1
disk 3, o:1, dev:hdl1
md: considering hdc8 ...
md: adding hdc8 ...
md: hdc7 has different UUID to hdc8
md: hdc6 has different UUID to hdc8
md: hdc5 has different UUID to hdc8
md: hdc3 has different UUID to hdc8
md: hdc2 has different UUID to hdc8
md: hdc1 has different UUID to hdc8
md: adding hda8 ...
md: hda7 has different UUID to hdc8
md: hda6 has different UUID to hdc8
md: hda5 has different UUID to hdc8
md: hda3 has different UUID to hdc8
md: hda2 has different UUID to hdc8
md: hda1 has different UUID to hdc8
devfs_mk_dev: could not append to parent for md/5
md: created md5
md: bind<hda8>
md: bind<hdc8>
md: running: <hdc8><hda8>
raid1: raid set md5 active with 2 out of 2 mirrors
md: considering hdc7 ...
md: adding hdc7 ...
md: hdc6 has different UUID to hdc7
md: hdc5 has different UUID to hdc7
md: hdc3 has different UUID to hdc7
md: hdc2 has different UUID to hdc7
md: hdc1 has different UUID to hdc7
md: adding hda7 ...
md: hda6 has different UUID to hdc7
md: hda5 has different UUID to hdc7
md: hda3 has different UUID to hdc7
md: hda2 has different UUID to hdc7
md: hda1 has different UUID to hdc7
devfs_mk_dev: could not append to parent for md/4
md: created md4
md: bind<hda7>
md: bind<hdc7>
md: running: <hdc7><hda7>
raid1: raid set md4 active with 2 out of 2 mirrors
md: considering hdc6 ...
md: adding hdc6 ...
md: hdc5 has different UUID to hdc6
md: hdc3 has different UUID to hdc6
md: hdc2 has different UUID to hdc6
md: hdc1 has different UUID to hdc6
md: adding hda6 ...
md: hda5 has different UUID to hdc6
md: hda3 has different UUID to hdc6
md: hda2 has different UUID to hdc6
md: hda1 has different UUID to hdc6
devfs_mk_dev: could not append to parent for md/3
md: created md3
md: bind<hda6>
md: bind<hdc6>
md: running: <hdc6><hda6>
raid1: raid set md3 active with 2 out of 2 mirrors
md: considering hdc5 ...
md: adding hdc5 ...
md: hdc3 has different UUID to hdc5
md: hdc2 has different UUID to hdc5
md: hdc1 has different UUID to hdc5
md: adding hda5 ...
md: hda3 has different UUID to hdc5
md: hda2 has different UUID to hdc5
md: hda1 has different UUID to hdc5
devfs_mk_dev: could not append to parent for md/2
md: created md2
md: bind<hda5>
md: bind<hdc5>
md: running: <hdc5><hda5>
raid1: raid set md2 active with 2 out of 2 mirrors
md: considering hdc3 ...
md: adding hdc3 ...
md: hdc2 has different UUID to hdc3
md: hdc1 has different UUID to hdc3
md: adding hda3 ...
md: hda2 has different UUID to hdc3
md: hda1 has different UUID to hdc3
md: created md0
md: bind<hda3>
md: bind<hdc3>
md: running: <hdc3><hda3>
raid1: raid set md0 active with 2 out of 2 mirrors
md: considering hdc2 ...
md: adding hdc2 ...
md: hdc1 has different UUID to hdc2
md: adding hda2 ...
md: hda1 has different UUID to hdc2
devfs_mk_dev: could not append to parent for md/8
md: created md8
md: bind<hda2>
md: bind<hdc2>
md: running: <hdc2><hda2>
raid1: raid set md8 active with 2 out of 2 mirrors
md: considering hdc1 ...
md: adding hdc1 ...
md: adding hda1 ...
devfs_mk_dev: could not append to parent for md/1
md: created md1
md: bind<hda1>
md: bind<hdc1>
md: running: <hdc1><hda1>
raid1: raid set md1 active with 2 out of 2 mirrors
md: ... autorun DONE.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 216k freed
kjournald starting. Commit interval 5 seconds
NET: Registered protocol family 1
Adding 4882680k swap on /dev/md8. Priority:-1 extents:1
EXT3 FS on md0, internal journal
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: [email protected]
kjournald starting. Commit interval 5 seconds
EXT3 FS on md1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on md2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on md3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on md4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected AMD 760MP chipset
agpgart: Maximum main memory to use for agp memory: 1919M
agpgart: AGP aperture is 32M @ 0xfc000000
cpci_hotplug: CompactPCI Hot Plug Core version: 0.2
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: shpc_init : shpc_cap_offset == 0
shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
hw_random: AMD768 system management I/O registers at 0xE400.
hw_random hardware driver 1.0.0 loaded
r8169 Gigabit Ethernet driver 2.2LK loaded
eth0: Identified chip type is 'RTL8169'.
eth0: RTL8169 at 0xf8ad4000, 00:08:a1:3c:11:a6, IRQ 16
natsemi dp8381x driver, version 1.07+LK1.0.17, Sep 27, 2002
originally by Donald Becker <[email protected]>
http://www.scyld.com/network/natsemi.html
2.4.x kernel port by Jeff Garzik, Tjeerd Mulder
natsemi eth1: NatSemi DP8381[56] at 0xf8800000 (0000:02:00.0), 00:00:24:c3:4c:48, IRQ 17, port TP.
natsemi eth2: NatSemi DP8381[56] at 0xf8000000 (0000:02:01.0), 00:00:24:c3:4c:49, IRQ 18, port TP.
natsemi eth3: NatSemi DP8381[56] at 0xf7800000 (0000:02:02.0), 00:00:24:c3:4c:4a, IRQ 19, port TP.
natsemi eth4: NatSemi DP8381[56] at 0xf7000000 (0000:02:03.0), 00:00:24:c3:4c:4b, IRQ 16, port TP.
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ohci_hcd: 2004 Nov 08 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd 0000:03:00.0: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
ohci_hcd 0000:03:00.0: irq 19, pci mem 0xf6800000
ohci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 1
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
cmpci: version $Revision: 6.82 $ time 17:37:10 May 20 2005
cmpci:
cmpci: found CM8738 adapter at io 0x8800 irq 17
cmpci: chip version = 055
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
input: PC Speaker
hdb: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
Bridge firewalling registered
device eth0 entered promiscuous mode
eth0: Promiscuous mode enabled.
r8169: eth0: link up
eth0: Promiscuous mode enabled.
eth0: Promiscuous mode enabled.
eth0: Promiscuous mode enabled.
eth0: Promiscuous mode enabled.
br0: port 1(eth0) entering learning state
device eth1 entered promiscuous mode
eth1: DSPCFG accepted after 0 usec.
eth1: link up.
eth1: Promiscuous mode enabled.
eth1: Promiscuous mode enabled.
eth1: Promiscuous mode enabled.
eth1: Promiscuous mode enabled.
eth1: Promiscuous mode enabled.
br1: port 1(eth1) entering learning state
device eth2 entered promiscuous mode
eth2: DSPCFG accepted after 0 usec.
eth2: link up.
eth2: Setting full-duplex based on negotiated link capability.
eth2: Promiscuous mode enabled.
eth2: Promiscuous mode enabled.
eth2: Promiscuous mode enabled.
eth2: Promiscuous mode enabled.
eth2: Promiscuous mode enabled.
br2: port 1(eth2) entering learning state


Attachments:
config.txt (56.55 kB)
dmesg_boot.txt (19.00 kB)
Download all attachments

2005-05-27 01:39:04

by jpearson

[permalink] [raw]
Subject: Re: Fake ext3 corruption on raid5 in 2.6 .9 smp

Hi,

I saw this exact same error (EXT3-fs error (device dm-x): ext3_readdir:
bad entry in directory #nnnnnnn: rec_len % 4 != 0 - offset=0, inode=xxxxxxxx,
rec_len=...

from time to time in my non-SMP RAID system with 512Mb RAM, with ext3 on LVM on top of RAID5.

Never caused actual corruption - run FSCK, no errors, remount rw
successfully until next time; error rarely in the same place, but always
in a directory and rec_len % 4 != 0. Looks like an 'in-kernel' thing,
because (e.g.) running find on the volume after remounting rw produced
no issues, so presumably the on-disk directory wasn't *really* the
issue.

Filesystems between about 8 and 50 Gb, and not what I'd characterise as a
heavy load.

This was with about 2.6.4 - 2.6.7. I'm running 2.6.11 now and haven't
seen it in some time; so it was either fixed by 2.6.11, or mounting ro
by default has just reduced my exposure.


John Pearson.


On Thu, May 26, 2005 at 11:34:24AM +0200, Olivier Guerrier wrote
> Hello,
>
> I've encoutered the following strangeness with a newly installed box.
> This box was not previously running Linux, but has been carefully tested
> with memtest86, cpuburn, and some smalls scripts of mine. So hardware
> failure may be excluded at first.
>
> The box is an asus a7m266D with dual athlon-MP (true MP), 2Go RAM, and
> the raid 5 set is build on 4 maxtor 160Go connected to 2 additionals
> sil0680 ata133 IDE controllers.
>
> I use dm aes encryption over sotfware raid5, and under heavy load,
> I get theses error messages, and the partition is remounted ro:
>
> 8<--
> EXT3-fs error (device dm-6): ext3_readdir: bad entry in directory
> #20515395: rec_len % 4 != 0 - offset=0, inode=1605429031, rec_len=14026,
> name_len=177
> Aborting journal on device dm-6.
> ext3_abort called.
> EXT3-fs error (device dm-6): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
> EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
> (repeated 3 times)
> EXT3-fs error (device dm-6) in ext3_ordered_writepage: IO failure
> EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
> (repeated 53 times)
> __journal_remove_journal_head: freeing b_committed_data (repeated 5 times)
> -->8
>
> It is easily reproducible.
>
> fsck show no error, no hardware related message found in dmesg, the raid
> set is not marked as bad, no reconstruction needed. I just remount the
> partition rw, and now try to keep the box out of pressure.
>
> Googling around I found theses threads, but it does not contain a
> conclusion:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107913912311421&w=2
> and
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=+295657
>
> The kernel is patched with grsec, I have disabled it all but same
> behaviour occur, maybe a vanilla will change something, but previous
> links show same error on non-grsec kernel.
>
> attached is my .config and boot-time dmesg.
>
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: config.txt
> Url: http://lists.us.dell.com/pipermail/linux-kernel-daily-digest/attachments/20050526/15e8620a/config.txt
> -------------- next part --------------
> An embedded and charset-unspecified text was scrubbed...
> Name: dmesg_boot.txt
> Url: http://lists.us.dell.com/pipermail/linux-kernel-daily-digest/attachments/20050526/15e8620a/dmesg_boot.txt
>
> ------------------------------

--
Voice: +61 8 8202 9040
Email: [email protected]

Oasis Systems Pty Ltd
288 Glen Osmond Road
Fullarton, South Australia 5063

Ph: + 61 8 82029000
Fax: +61 8 82029001

CAUTION: This email and any attachments may contain information that is
confidential and subject to copyright. If you are not the
intended recipient, you must not read, use, disseminate, distribute or
copy this email or any attachments. If you have received this
email in error, please notify the sender immediately by reply email and
erase this email and any attachments.

DISCLAIMER: OASIS Systems uses virus-scanning technology but accepts
no responsibility for loss or damage arising from the use of the
information transmitted by this email including damage from virus.

2005-05-27 02:24:51

by Olivier Guerrier

[permalink] [raw]
Subject: Re: Fake ext3 corruption on raid5 in 2.6.11.9 smp

First, I just realized the typo in my previous message's subject:
Should read 2.6.*11.*9 smp, instead of 2.6_.9. Sorry :|

jpearson wrote:
> Hi,
>
> I saw this exact same error (EXT3-fs error (device dm-x): ext3_readdir:
> bad entry in directory #nnnnnnn: rec_len % 4 != 0 - offset=0, inode=xxxxxxxx,
> rec_len=...
>
> from time to time in my non-SMP RAID system with 512Mb RAM, with ext3 on LVM on top of RAID5.
>
> Never caused actual corruption - run FSCK, no errors, remount rw
> successfully until next time; error rarely in the same place, but always
> in a directory and rec_len % 4 != 0. Looks like an 'in-kernel' thing,
> because (e.g.) running find on the volume after remounting rw produced
> no issues, so presumably the on-disk directory wasn't *really* the
> issue.

I confirm this here too: random place, always a dir, always 'rec_len % 4
!= 0', no fs issue or data loss (so far...)

> Filesystems between about 8 and 50 Gb, and not what I'd characterise as a
> heavy load.

By heavy load, I mean a system load between 10 and 15 for 3 hours
(before error) Processes running were several instances of mkisofs
(reading from and writing to the faulty partition)

> This was with about 2.6.4 - 2.6.7. I'm running 2.6.11 now and haven't
> seen it in some time; so it was either fixed by 2.6.11, or mounting ro
> by default has just reduced my exposure.

As my kernel is a 2.6.11.9, It is not fixed so far.

I will reformat when possible, this time I will use lvm over raid5, so I
can use xfs for my usefull data, and keep a medium ext3 partition to
make tests if needed (just need to know what to test)

Thanks