Date: Tue, 8 Jul 2008 12:33:11 +0200 (CEST)
From: Gerhard Wiesinger <lists@wiesinger.com>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
       linux-ide@vger.kernel.org
Subject: Re: Lots of con-current I/O = resets SATA link? (2.6.25.10)
In-Reply-To: <alpine.DEB.1.10.0807080433550.17980@p34.internal.lan>
Message-ID: <alpine.LFD.1.10.0807081232350.32659@bbs.intern>
References: <alpine.DEB.1.10.0807051252200.12562@p34.internal.lan> <alpine.LFD.1.10.0807071649340.1160@bbs.intern> <alpine.LFD.1.10.0807071705490.2997@bbs.intern> <alpine.DEB.1.10.0807071149530.32508@p34.internal.lan> <alpine.LFD.1.10.0807080811560.5439@bbs.intern>
 <alpine.DEB.1.10.0807080433550.17980@p34.internal.lan>
User-Agent: Alpine 1.10 (LFD 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-MailScanner-Information-wiesinger.com: Please contact the ISP for more information
X-MailScanner-wiesinger.com: Found to be clean
X-MailScanner-SpamCheck-wiesinger.com: not spam, SpamAssassin (score=-4.399,
	required 4.5, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -2.60)
X-MailScanner-From: lists@wiesinger.com
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Length: 2613
Lines: 87

On Tue, 8 Jul 2008, Justin Piszcz wrote:

>
>
> On Tue, 8 Jul 2008, Gerhard Wiesinger wrote:
>
>> On Mon, 7 Jul 2008, Justin Piszcz wrote:
>> 
>>> Hi Gerhard,
>>> 
>>> It /could/ be the port itself if you have changed the cable and disk..
>>> 
>> 
>> Yes, but it is very unlikely. I have written TB of data there without any 
>> problems. Anyway this is my 3rd exchanged SAMSUNG disk ...
>> 
>> 
>>> Have you tried loading the disk with dd and seeing if you can reproduce 
>>> the problem? You are getting the same error I get generally, I can 
>>> recommend turning OFF NCQ first and see if the problem goes away.
>>> 
>>> # Define DISKS.
>>> cd /sys/block
>>> DISKS=$(/bin/ls -1d sd[a-z])
>>> 
>>> # Disable NCQ on all disks.
>>> echo "Disabling NCQ on all disks..."
>>> for i in $DISKS
>>> do
>>>  echo "Disabling NCQ on $i"
>>>  echo 1 > /sys/block/"$i"/device/queue_depth
>>> done
>>> 
>> 
>> I tried to disable NCQ on all disks and tried to rebuild the raid, but it 
>> still failed to rebuild with the same error message.
>> 
>> I also tried the nolapic kernel parameter without success.
>> 
>> /dev/sda:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>> Always       -       0
>> /dev/sdb:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>> Always       -       0
>> /dev/sdc:  5 Reallocated_Sector_Ct   0x0033   091   091   010    Pre-fail 
>> Always       -       413
>> /dev/sdd:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>> Always       -       0
>> /dev/sde:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>> Always       -       0
>> /dev/sdf:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>> Always       -       0
>> 
>> The only thing is that the Reallocated_Sector_Ct is still >0 on /dev/sdc 
>> (keep in mind this is my 3rd new Samsung disk on /dev/sdc and I had up to 
>> 3000 Reallocated_Sector_Ct on previous disks in < 1 day !!!).
>> 
>> Should I replace the disk a fourth time?
>> 
>> When you search in google you find a lot of threads with the timeout 
>> problem. Might this be a software issue?
>> 
>> Any ideas?
>
> Please run:
>
> smartctl -t short /dev/sdc
> sleep 300
> smartctl -t long /dev/sdc
>
> Wait 2-3 hours or more and:
>
> smartctl -a /dev/sdc


I'm changing the disk one more time ...

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/