Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932418AbXHHIQW (ORCPT ); Wed, 8 Aug 2007 04:16:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752172AbXHHIQH (ORCPT ); Wed, 8 Aug 2007 04:16:07 -0400 Received: from smtp27.orange.fr ([80.12.242.95]:24424 "EHLO smtp27.orange.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752140AbXHHIQA convert rfc822-to-8bit (ORCPT ); Wed, 8 Aug 2007 04:16:00 -0400 X-ME-UUID: 20070808081558137.2177A1C00093@mwinf2707.orange.fr From: paul Reply-To: paul.pinault@disk91.com To: Jeff Garzik Subject: Re: Data corruption Date: Wed, 8 Aug 2007 10:15:55 +0200 User-Agent: KMail/1.9.5 Cc: linux-kernel@vger.kernel.org References: <200708072007.27343.paul.pinault@disk91.com> <46B8ECA7.3000300@garzik.org> In-Reply-To: <46B8ECA7.3000300@garzik.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Content-Disposition: inline Message-Id: <200708081015.56262.paul.pinault@disk91.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 31459 Lines: 668 Le mercredi 8 ao?t 2007 00:05, Jeff Garzik a ?crit?: > paul wrote: > > Since 2-3 month I have some random data corruption on my Linux server, > > after checking disks independently (i'm using raid1on 2 sata disk, the > > problem is the same w/o raid) and memory, hardware simce to be out of > > cause... > > > > Here is my problem: > > => head --bytes=300m /dev/urandom > test > > => for i in `seq 0 9` ; do cp test test$i ; done > > => md5sum test* > > I got : > > 014666c728c9e3b8299579fae499864a test > > 014666c728c9e3b8299579fae499864a test0 > > 333fd93d093ac612cd8d5f65628f734e test1 > > 1ab6ee68c6a7d9ff5a05f9d63f0f6df6 test2 > > 96e96483e3175a59c9c05b6720514e1e test3 > > 014666c728c9e3b8299579fae499864a test4 > > b24dbccc9f4831f8825ab4a55a3be4aa test5 > > 8493efc9c14e4b5c162ac23696fbc16a test6 > > 6a5f4301f66d0379049d79d0e14e2a87 test7 > > 2c81cfa1c3a03aba134574922ee5d75c test8 > > 2ea15c8392bfd0123472a80125bb3abe test9 > > > > ^^^ that sounds really bad for my data :( > > > > =================================================================== > > I did some tests : > > * badblocks on the two disk with ro and rw tests => report no error > > * memtest during 6 hours => report no error > > > > * I reproduces the error > > - under xen client host (first time issue) > > - under xen hypervisor > > - under basic kernel with raid mirroring + ext3 and raiserfs > > - under basic kernel w/o raid but ext3 ans reiserfs > > > > My configuration > > * Asus P5B-VM > > * 4 Gb [try with and w/o options memory remaping] > > * Intel Core 2 Duo [normal speed and underclocked(233 bus speed)] > > * Hd SATA WD 80Gb > > Corruption with which controller? pata_jmicron? ata_piix? ahci? > > Can you reproduce with 2.6.23-rc2? If not, please report the bug to > OpenSuSE, since we only support unmodified vanilla kernels here. > > Jeff Should be ahci => standard sata port, not jmicron extensions.... I can try to reproduce with 2.6.23-rc2 ... now with 2GB I'm able compile something corrcetly ;) result of the test with 2.6.23 with 2Gb memory : OK result with 2.6.23 rc2 @ 4GB : KO => 2eb9b0f7c7d773170dca5cb304b71d7d test c85546b6ed3b10a0354af101a61ee27f test0 b4879a2af789a731889c578af8adce75 test1 a606258dcc42542034026bf8e0918a15 test2 68d53650a7852cd8e93faa86cded4aae test3 ^^ system crashed after.... ---------------------------------------------------------------- uname -a Linux xen-prod 2.6.23-rc2-default #1 SMP Wed Aug 8 09:08:54 CEST 2007 x86_64 x86_64 x86_64 GNU/Linux ----------------------------------------------------------------- # ./ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux xen-prod 2.6.23-rc2-default #1 SMP Wed Aug 8 09:08:54 CEST 2007 x86_64 x86_64 x86_64 GNU/Linux Gnu C 4.1.2 Gnu make 3.81 binutils 2.17.50.0.5 util-linux 2.12r mount 2.12r module-init-tools 3.2.2 e2fsprogs 1.39 jfsutils 1.1.11 reiserfsprogs 3.6.19 xfsprogs 2.8.11 PPP 2.4.4 Linux C Library 2.5 Dynamic linker (ldd) 2.5 Procps 3.2.7 Net-tools 1.60 Kbd 1.12 Sh-utils 6.4 udev 103 wireless-tools 29 Modules Loaded ipt_LOG xt_limit xt_pkttype snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave acpi_cpufreq freq_table button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT iptable_mangle iptable_filter ip6table_mangle ip_tables ip6table_filter ip6_tables x_tables ipv6 ext3 jbd mbcache loop raid1 dm_mod sr_mod cdrom pata_jmicron generic ide_core ohci1394 ieee1394 snd_hda_intel snd_pcm snd_timer snd soundcore uhci_hcd ehci_hcd r8169 i2c_i801 i2c_core intel_agp snd_page_alloc usbcore parport_pc lp parport reiserfs edd fan sg ata_piix ahci libata thermal processor sd_mod scsi_mod --------------------------------------------------------------------------- # lspci -vvv 00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02) Subsystem: ASUSTeK Computer Inc. Unknown device 81ea Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [88] Subsystem: Intel Corporation Unknown device 277d Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ Address: fee0300c Data: 4159 Capabilities: [a0] Express Root Port (Slot+) IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <64ns, L1 <1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s, Port 2 Link: Latency L0s <256ns, L1 <4us Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x0 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise- Slot: Number 0, PowerLimit 0.000000 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- Slot: AttnInd Off, PwrInd On, Power- Root: Correctable- Non-Fatal- Fatal- PME- 00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02) (prog-if 00 [VGA]) Subsystem: ASUSTeK Computer Inc. Unknown device 81ea Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [40] Express Root Port (Slot+) IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 1 Link: Latency L0s <1us, L1 <4us Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x0 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+ Slot: Number 0, PowerLimit 0.000000 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- Slot: AttnInd Unknown, PwrInd Unknown, Power- Root: Correctable- Non-Fatal- Fatal- PME- Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ Address: fee0300c Data: 4161 Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Unknown device 81ec Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [40] Express Root Port (Slot+) IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 5 Link: Latency L0s <256ns, L1 <4us Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+ Slot: Number 0, PowerLimit 0.000000 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- Slot: AttnInd Unknown, PwrInd Unknown, Power- Root: Correctable- Non-Fatal- Fatal- PME- Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ Address: fee0300c Data: 4169 Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Unknown device 81ec Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [40] Express Root Port (Slot+) IRQ 0 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 6 Link: Latency L0s <256ns, L1 <4us Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x1 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+ Slot: Number 0, PowerLimit 0.000000 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- Slot: AttnInd Unknown, PwrInd Unknown, Power- Root: Correctable- Non-Fatal- Fatal- PME- Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+ Address: fee0300c Data: 4171 Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Unknown device 81ec Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI #1 (rev 02) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. Unknown device 81ec Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [50] Subsystem: ASUSTeK Computer Inc. Unknown device 81ec 00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02) Subsystem: ASUSTeK Computer Inc. Unknown device 81ec Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-