2008-11-13 21:22:27

by L A Walsh

[permalink] [raw]
Subject: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

FYI -- ever since I switched to using SATA, I've not had a stable kernel.
Sys uptime went from near infinite (striking planned take downs), to less
than a week consistently. I'd been using the Promise 300 TX4 with 1-2
Seagate drives. (PDC40718, rev 02).

Finally an explicit problem regarding that controller under Linux, with it
timing out a drive returning from suspend during 'SMART' operations, got a
suggestions from the community (Tnx, Tejun Heo) to try a _cheaper_ but
better featured Silicon Image controller (SiI 3124 Sata).

Not only did it NOT have the SMART problem (that would hang the drive or
machine), but my random hangs seem to have gone away.

My main server has been up nearly 21 days now on 2.6.27-3 SMP
(vanilla-i386).

I'd had problems with the ranging in kernels going back to 2.6.24 or so
when I had first tried adding SATA to the system.

So Tnx again to Tejun --

and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't
stable for production use -- and has a repeatable problem of timing out
some drives before it can spin-up from standby (just the drive -- not the
computer). The error logically removes the drive from the system until
the next boot (unplugging, and replugging in the SATA cable on the drive
would hang the machine within 5 seconds of replugging in the cable). Not
an instant, hang as might indicated a HW upset plugging in cable, but a
couple second delay after plugin -- before keyboard would lock up --
pointing toward the software trying to re-add+initialize the drive.

Needless to say, I'm only using the Sil controller now, and things are
stable.


2008-11-16 06:04:48

by Tejun Heo

[permalink] [raw]
Subject: Re: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

(cc'ing Mikael Pettersson)
Hello, Linda.

Linda Walsh wrote:
> FYI -- ever since I switched to using SATA, I've not had a stable kernel.
> Sys uptime went from near infinite (striking planned take downs), to less
> than a week consistently. I'd been using the Promise 300 TX4 with 1-2
> Seagate drives. (PDC40718, rev 02).
>
> Finally an explicit problem regarding that controller under Linux, with it
> timing out a drive returning from suspend during 'SMART' operations, got a
> suggestions from the community (Tnx, Tejun Heo) to try a _cheaper_ but
> better featured Silicon Image controller (SiI 3124 Sata).

Yeah, I'm quite fond of the controller. Except for the bandwidth
limit due to limited number of postable requests which shows up only
when multiple drives are attached to a single port via PMP, I can't
think of anything bad about it.

> Not only did it NOT have the SMART problem (that would hang the drive or
> machine), but my random hangs seem to have gone away.
>
> My main server has been up nearly 21 days now on 2.6.27-3 SMP
> (vanilla-i386).
>
> I'd had problems with the ranging in kernels going back to 2.6.24 or so
> when I had first tried adding SATA to the system.
>
> So Tnx again to Tejun --
>
> and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't
> stable for production use -- and has a repeatable problem of timing out
> some drives before it can spin-up from standby (just the drive -- not the
> computer). The error logically removes the drive from the system until
> the next boot (unplugging, and replugging in the SATA cable on the drive
> would hang the machine within 5 seconds of replugging in the cable). Not
> an instant, hang as might indicated a HW upset plugging in cable, but a
> couple second delay after plugin -- before keyboard would lock up --
> pointing toward the software trying to re-add+initialize the drive.

Some promise controllers seem to suffer transmission problems when
combined with certain drives, which often show up as timeouts. The
hardreset of sata_promise wasn't as robust as it should have been and
in some cases it wasn't able to recover a link after error condition
causing the system to lose drive after such events. The hardreset
problem was fixed recently by Mikael Pettersson. Can you please try
2.6.28-rc5 and see whether sata_promise still loses drives after
failures?

Mikael, I think the hardreset fix is worthy including into -stable.
It should be safe for -stable too, right?

Thanks.

--
tejun

2008-11-16 11:16:17

by Mikael Pettersson

[permalink] [raw]
Subject: Re: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

Tejun Heo writes:
> > and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't
> > stable for production use -- and has a repeatable problem of timing out
> > some drives before it can spin-up from standby (just the drive -- not the
> > computer). The error logically removes the drive from the system until
> > the next boot (unplugging, and replugging in the SATA cable on the drive
> > would hang the machine within 5 seconds of replugging in the cable). Not
> > an instant, hang as might indicated a HW upset plugging in cable, but a
> > couple second delay after plugin -- before keyboard would lock up --
> > pointing toward the software trying to re-add+initialize the drive.
>
> Some promise controllers seem to suffer transmission problems when
> combined with certain drives, which often show up as timeouts. The
> hardreset of sata_promise wasn't as robust as it should have been and
> in some cases it wasn't able to recover a link after error condition
> causing the system to lose drive after such events. The hardreset
> problem was fixed recently by Mikael Pettersson. Can you please try
> 2.6.28-rc5 and see whether sata_promise still loses drives after
> failures?
>
> Mikael, I think the hardreset fix is worthy including into -stable.
> It should be safe for -stable too, right?

The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable
too, but that branch seems to have been closed now.

2008-11-16 14:21:44

by Tejun Heo

[permalink] [raw]
Subject: Re: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

2008-11-16 (일), 12:08 +0100, Mikael Pettersson wrote:
> > Mikael, I think the hardreset fix is worthy including into -stable.
> > It should be safe for -stable too, right?
>
> The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable
> too, but that branch seems to have been closed now.

Ah, right, I was looking at 2.6.26.6 and thinking it was 2.6.27.6. :-P

Thanks.

--
tejun

2008-11-16 18:08:41

by Brad Campbell

[permalink] [raw]
Subject: Re: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

Mikael Pettersson wrote:
> Tejun Heo writes:
> > > and NOTE: the card or driver (or both) for the Promise 300 TX4 isn't
> > > stable for production use -- and has a repeatable problem of timing out
> > > some drives before it can spin-up from standby (just the drive -- not the
> > > computer). The error logically removes the drive from the system until
> > > the next boot (unplugging, and replugging in the SATA cable on the drive
> > > would hang the machine within 5 seconds of replugging in the cable). Not
> > > an instant, hang as might indicated a HW upset plugging in cable, but a
> > > couple second delay after plugin -- before keyboard would lock up --
> > > pointing toward the software trying to re-add+initialize the drive.
> >
> > Some promise controllers seem to suffer transmission problems when
> > combined with certain drives, which often show up as timeouts. The
> > hardreset of sata_promise wasn't as robust as it should have been and
> > in some cases it wasn't able to recover a link after error condition
> > causing the system to lose drive after such events. The hardreset
> > problem was fixed recently by Mikael Pettersson. Can you please try
> > 2.6.28-rc5 and see whether sata_promise still loses drives after
> > failures?
> >
> > Mikael, I think the hardreset fix is worthy including into -stable.
> > It should be safe for -stable too, right?
>
> The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable
> too, but that branch seems to have been closed now.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Is that likely to do anything for the old SATA150-TX4 ?
I have 2 of them in a machine and I've been dropping drives under write load recently but it was a
2.6.27.4 kernel.

Reboot required to pick up the drives again (unless the kernel panics and it reboots itself - which
it's been doing also).

Brad
--
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

2008-11-17 02:02:25

by Tejun Heo

[permalink] [raw]
Subject: Re: FYI: BUG in SATA Promise 300 TX4 (2.6.24 - 2.6.27-3) w/Linux

Brad Campbell wrote:
>> The hardreset fix was included in 2.6.27.5. I wanted it in 2.6.26-stable
>> too, but that branch seems to have been closed now.
>
> Is that likely to do anything for the old SATA150-TX4 ?
> I have 2 of them in a machine and I've been dropping drives under write
> load recently but it was a 2.6.27.4 kernel.
>
> Reboot required to pick up the drives again (unless the kernel panics
> and it reboots itself - which it's been doing also).

Does unloading and reloading sata_promise fix the problem? If your root
is on promise, you'll need to try this from usb stick or live CD.

--
tejun