Hi all
Stresstesting a SATA drive+controller, I get the error below after a
while. How can I find if this error is due to a controller failure, a
bad driver, or a drive failure?
thanks for all help
system info below
roy
-----
Running an unpatched 2.6.18.1
This is a SIS controller:
01:03.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/
SATARaid] Serial ATA Controller (rev 02)
Subsystem: Silicon Image, Inc. SiI 3114 SATARaid Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at dc00 [size=8]
Region 1: I/O ports at d480 [size=4]
Region 2: I/O ports at d400 [size=8]
Region 3: I/O ports at d080 [size=4]
Region 4: I/O ports at d000 [size=16]
Region 5: Memory at ff8efc00 (32-bit, non-prefetchable)
[size=1K]
Expansion ROM at c6a00000 [disabled] [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME
(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
The drive is a seagate 400gig thing (dunno how I can find what),
libata is alone at IRQ10.
/var/log/kern.log output
Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140273
Oct 19 18:32:04 ganske kernel: Aborting journal on device sda1.
Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
ext3_ordered_writepage: IO failure
Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
ext3_ordered_writepage: IO failure
Oct 19 18:32:04 ganske kernel: ext3_abort called.
Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1):
ext3_journal_start_sb: Detected aborted journal
Oct 19 18:32:04 ganske kernel: Remounting filesystem read-only
Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
ext3_ordered_writepage: IO failure
Oct 19 18:32:04 ganske last message repeated 2 times
Oct 19 18:32:08 ganske kernel: 0843
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140844
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140845
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140847
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140848
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140851
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140855
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140857
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140860
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140863
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140864
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140867
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140868
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140870
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140871
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140873
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140875
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140879
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140883
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140884
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140886
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140887
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140889
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140891
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140892
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140893
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140895
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140896
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140898
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140899
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140902
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140903
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks_sb: bit already cleared for block 35140904
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_free_blocks_sb: Journal has aborted
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_reserve_inode_write: Journal has aborted
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_truncate: Journal has aborted
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_reserve_inode_write: Journal has aborted
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_orphan_del: Journal has aborted
Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
ext3_reserve_inode_write: Journal has aborted
Oct 19 18:32:08 ganske kernel: __journal_remove_journal_head: freeing
b_committed_data
Oct 19 18:32:08 ganske last message repeated 219 times
ganske:~#
--
Roy Sigurd Karlsbakk
[email protected]
-------------------------------
MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
Software Only Fools Teenagers"
Roy Sigurd Karlsbakk wrote:
> Hi all
>
> Stresstesting a SATA drive+controller, I get the error below after a
> while. How can I find if this error is due to a controller failure, a
> bad driver, or a drive failure?
Is there any libata/SCSI error messages in your log?
--
tejun
> Roy Sigurd Karlsbakk wrote:
>
>> Hi all
>> Stresstesting a SATA drive+controller, I get the error below after
>> a while. How can I find if this error is due to a controller
>> failure, a bad driver, or a drive failure?
>
> Is there any libata/SCSI error messages in your log?
>
Nope. Just the ones from ext3. I first tried with a kernel from
debian etch, and then switched to 2.6.18.1. Same errors on both
roy
--
Roy Sigurd Karlsbakk
[email protected]
-------------------------------
MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
Software Only Fools Teenagers"
Can anyone help me out how to debug this further, please?
On 20. okt. 2006, at 12.06, Roy Sigurd Karlsbakk wrote:
> Hi all
>
> Stresstesting a SATA drive+controller, I get the error below after
> a while. How can I find if this error is due to a controller
> failure, a bad driver, or a drive failure?
>
> thanks for all help
>
> system info below
>
> roy
> -----
> Running an unpatched 2.6.18.1
>
> This is a SIS controller:
> 01:03.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/
> SATARaid] Serial ATA Controller (rev 02)
> Subsystem: Silicon Image, Inc. SiI 3114 SATARaid Controller
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> Latency: 32, Cache Line Size: 32 bytes
> Interrupt: pin A routed to IRQ 10
> Region 0: I/O ports at dc00 [size=8]
> Region 1: I/O ports at d480 [size=4]
> Region 2: I/O ports at d400 [size=8]
> Region 3: I/O ports at d080 [size=4]
> Region 4: I/O ports at d000 [size=16]
> Region 5: Memory at ff8efc00 (32-bit, non-prefetchable)
> [size=1K]
> Expansion ROM at c6a00000 [disabled] [size=512K]
> Capabilities: [60] Power Management version 2
> Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME
> (D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=2 PME-
>
> The drive is a seagate 400gig thing (dunno how I can find what),
> libata is alone at IRQ10.
>
> /var/log/kern.log output
> Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140273
> Oct 19 18:32:04 ganske kernel: Aborting journal on device sda1.
> Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
> ext3_ordered_writepage: IO failure
> Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
> ext3_ordered_writepage: IO failure
> Oct 19 18:32:04 ganske kernel: ext3_abort called.
> Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1):
> ext3_journal_start_sb: Detected aborted journal
> Oct 19 18:32:04 ganske kernel: Remounting filesystem read-only
> Oct 19 18:32:04 ganske kernel: EXT3-fs error (device sda1) in
> ext3_ordered_writepage: IO failure
> Oct 19 18:32:04 ganske last message repeated 2 times
> Oct 19 18:32:08 ganske kernel: 0843
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140844
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140845
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140847
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140848
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140851
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140855
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140857
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140860
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140863
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140864
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140867
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140868
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140870
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140871
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140873
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140875
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140879
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140883
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140884
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140886
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140887
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140889
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140891
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140892
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140893
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140895
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140896
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140898
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140899
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140902
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140903
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1):
> ext3_free_blocks_sb: bit already cleared for block 35140904
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_free_blocks_sb: Journal has aborted
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_reserve_inode_write: Journal has aborted
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_truncate: Journal has aborted
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_reserve_inode_write: Journal has aborted
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_orphan_del: Journal has aborted
> Oct 19 18:32:08 ganske kernel: EXT3-fs error (device sda1) in
> ext3_reserve_inode_write: Journal has aborted
> Oct 19 18:32:08 ganske kernel: __journal_remove_journal_head:
> freeing b_committed_data
> Oct 19 18:32:08 ganske last message repeated 219 times
> ganske:~#
>
> --
> Roy Sigurd Karlsbakk
> [email protected]
> -------------------------------
> MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
> Software Only Fools Teenagers"
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Roy Sigurd Karlsbakk
[email protected]
-------------------------------
MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
Software Only Fools Teenagers"
>> Roy Sigurd Karlsbakk wrote:
>>
>>> Hi all
>>> Stresstesting a SATA drive+controller, I get the error below
>>> after a while. How can I find if this error is due to a
>>> controller failure, a bad driver, or a drive failure?
>>
>> Is there any libata/SCSI error messages in your log?
>>
>
> Nope. Just the ones from ext3. I first tried with a kernel from
> debian etch, and then switched to 2.6.18.1. Same errors on both
Hi all
Sorry for stressing this, but is there a way I can debug this
further? it's a seagate drive connected to a sata_sil controller. I
only get ext3 errors, and it fails after a while whatever I do
thanks
roy
--
Roy Sigurd Karlsbakk
[email protected]
-------------------------------
MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
Software Only Fools Teenagers"
Only idea I have is to unmount the drive ( or remount r/o ) and
repeatedly md5sum the block device and see if it ever fails to correctly
read the data, and if you get any errors in your syslog. If you get no
error messages in your syslog and md5sum completes without error but
does not get the same hash each time, then there is definitely something
very fubar with the hardware or deep in the kernel.
Roy Sigurd Karlsbakk wrote:
>
> Hi all
>
> Sorry for stressing this, but is there a way I can debug this further?
> it's a seagate drive connected to a sata_sil controller. I only get ext3
> errors, and it fails after a while whatever I do
>
> thanks
>
> roy
>> Sorry for stressing this, but is there a way I can debug this
>> further? it's a seagate drive connected to a sata_sil controller.
>> I only get ext3 errors, and it fails after a while whatever I do
>
> Only idea I have is to unmount the drive ( or remount r/o ) and
> repeatedly md5sum the block device and see if it ever fails to
> correctly read the data, and if you get any errors in your syslog.
> If you get no error messages in your syslog and md5sum completes
> without error but does not get the same hash each time, then there
> is definitely something very fubar with the hardware or deep in the
> kernel.
md5sum has now been running in a loop for some 22 hours and completed
11 sums of the drive (md5summing 400 gigs takes a little while). the
md5sum is identical for each test, and the syslog has no error
indications. Then, starting harddisk stresstest, I get this error
again after about an hour testing:
Nov 3 11:33:17 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks: Freeing blocks not in datazone - block =
1349004846, count = 1
Nov 3 11:33:20 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks: Freeing blocks not in datazone - block =
1449605700, count = 1
Nov 3 11:33:23 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks: Freeing blocks not in datazone - block = 629024587,
count = 1
Nov 3 11:33:24 ganske kernel: EXT3-fs error (device sda1):
ext3_free_blocks: Freeing blocks not in datazone - block =
1059741014, count = 1
...
So, error only occurs on filesystem usage, not with direct
blockdevice access.
Any ideas?
roy
--
Roy Sigurd Karlsbakk
[email protected]
-------------------------------
MICROSOFT: Acronym for "Most Intelligent Customers Realise Our
Software Only Fools Teenagers"