LinuxLists.cc - SATA buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2007-12-30 05:15:31

Subject: SATA buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

I needed to get a new hard disk for one of my systems and thought that
it was about time to start going with SATA.

I picked up a Promise 4-Port Sata300-TX4 to go with a 750G
Seagate SATA -- I'd had good luck with a Promise ATA100 (P)ATA
and lower capacity Seagates and thought it would be a good combo.

Unfortunately, the *buffered* read performance is *horrible*!

I timed the new disk against a 400GB PATA and old 80MB/s SCSI-based
18.3G hard disk. While the raw speed numbers are faster as expected,
the linux-buffered read numbers are not good.

sda=18.3G on 80MB/s SCSI
sdb=the new 750GB on a 3Gb SATA w/NCQ.
hdf=400GB PATA on an ATA100 Promise card

I used "dd" for my tests, reading 2GB on a quiescent machine
that has 1GB of main memory. Output was to dev null. Input
was from the device (not a partition or file), (/dev/sda, /dev/sdb
and /dev/hdf). BS=1M, Count=2k. For the direct tests, I used
the "iflag=direct" param. No RAID or "volumes" are involved.

In each case, I took best run time out of 3 runs.

Direct read speeds (and cpu usage):
dev speed cpu/real %
sda 60MB/s 0.51/35.84 1.44
sdb 80MB/s 0.50/26.72 1.87
hdf 69.4MB/s 0.51/30.92 1.68

Buffered reads show the "bad news":
dev speed cpu/real %
sda 59.9MB/s 20.80/35.86 58.03
sdb 18.7MB/s 16.07/114.73 14.01 <-SATA extra badness
hdf 69.8MB/s 17.37/30.76 56.48

I assume this isn't expected behavior.

Why would buffered reads be so much slower for SATA? Shouldn't
it be the same buffering system used by sda and hdf? I can't
see how it would be the hardware or the driver since both
give "best" read performance with the new SATA disk being
15-20% faster.

But the buffered reads...are 60% *slower*. I want to ask if this
is even possible, even though the evidence seems to indicate it is.
But what I mean to ask is: "are the SATA buffered read paths
*so* different from SCSI and PATA that they could cause this?
Isn't the block layer mostly "independent" above the device
layer? If it isn't evident, I'm using the newer SATA drivers (not
the old ones included with the pata library and the pata disks
are using the old ATA interface.

I wanted to use the newer pata support in the SATA lib, but
got frustrated "real fast" by the lack of disk-parameter support
in the new pata library (hdparm is mostly broken; and the SCSI
utils aren't really intended for ATA(or SATA?) disks using the
SCSI interface.

Is there some 'gotcha' I'm missing? Google didn't seem to
throw any answers at me that 'stood out'.

Also, as a side issue -- have the buffered commands always
taken that much cpu vs. direct (machine has 2x1GHz-P-III's).
Maybe it has and I just haven't noticed it -- but my main
problem right now is with the horrible buffered SATA
performance.

Since SATA's use ATA-7 (or at least the Seagate disk I
acquired seems to), shouldn't most of the hdparm commands
be functional on the SATA hardware as much as they would
be on PATA? Or...maybe said a different way, is there
an "sdparm" that is to SATA what hdparm is to PATA?

The Promise controllers involved (PATA and SATA) are:
00:0d.0 Mass storage controller: Promise Technology, Inc. PDC20268
(Ultra100 TX2) (rev 02)
and
02:09.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA
300 TX4) (rev 02)

I'd ask about a newer driver, but the hardware seems pretty
fast if I go around the Linux kernel. Ideas? What could
slow down the linux-buffer layer when the driver is faster?
Perversely, could it be the faster driver speed just tipping
over some internal "flooding" limit which degrades buffered
performance?

Very Confused & TIA,
Linda

2007-12-30 18:17:05

by Robert Hancock

[permalink] [raw]

Subject: Re: SATA buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Linda Walsh wrote:
> I needed to get a new hard disk for one of my systems and thought that
> it was about time to start going with SATA.
>
> I picked up a Promise 4-Port Sata300-TX4 to go with a 750G
> Seagate SATA -- I'd had good luck with a Promise ATA100 (P)ATA
> and lower capacity Seagates and thought it would be a good combo.
>
> Unfortunately, the *buffered* read performance is *horrible*!
>
> I timed the new disk against a 400GB PATA and old 80MB/s SCSI-based
> 18.3G hard disk. While the raw speed numbers are faster as expected,
> the linux-buffered read numbers are not good.
>
>
> sda=18.3G on 80MB/s SCSI
> sdb=the new 750GB on a 3Gb SATA w/NCQ.
> hdf=400GB PATA on an ATA100 Promise card
>
> I used "dd" for my tests, reading 2GB on a quiescent machine
> that has 1GB of main memory. Output was to dev null. Input
> was from the device (not a partition or file), (/dev/sda, /dev/sdb
> and /dev/hdf). BS=1M, Count=2k. For the direct tests, I used
> the "iflag=direct" param. No RAID or "volumes" are involved.
>
> In each case, I took best run time out of 3 runs.
>
> Direct read speeds (and cpu usage):
> dev speed cpu/real %
> sda 60MB/s 0.51/35.84 1.44
> sdb 80MB/s 0.50/26.72 1.87
> hdf 69.4MB/s 0.51/30.92 1.68
>
>
> Buffered reads show the "bad news":
> dev speed cpu/real %
> sda 59.9MB/s 20.80/35.86 58.03
> sdb 18.7MB/s 16.07/114.73 14.01 <-SATA extra badness
> hdf 69.8MB/s 17.37/30.76 56.48
>
> I assume this isn't expected behavior.
>
> Why would buffered reads be so much slower for SATA? Shouldn't
> it be the same buffering system used by sda and hdf? I can't
> see how it would be the hardware or the driver since both
> give "best" read performance with the new SATA disk being
> 15-20% faster.
>
> But the buffered reads...are 60% *slower*. I want to ask if this
> is even possible, even though the evidence seems to indicate it is.
> But what I mean to ask is: "are the SATA buffered read paths
> *so* different from SCSI and PATA that they could cause this?
> Isn't the block layer mostly "independent" above the device
> layer? If it isn't evident, I'm using the newer SATA drivers (not
> the old ones included with the pata library and the pata disks
> are using the old ATA interface.

Have you tried using a different block size to see how that effects the
results? There might be some funny interaction there.

>
> I wanted to use the newer pata support in the SATA lib, but
> got frustrated "real fast" by the lack of disk-parameter support
> in the new pata library (hdparm is mostly broken; and the SCSI
> utils aren't really intended for ATA(or SATA?) disks using the
> SCSI interface.

It's somewhat intentional that some of the hdparm commands (like for
settting transfer modes, enable/disable DMA, etc.) don't work with
libata. Most of them aren't necessary at all as correct DMA settings,
etc. should always be set automatically (if not, please report as a bug).

>
> Is there some 'gotcha' I'm missing? Google didn't seem to
> throw any answers at me that 'stood out'.
>
> Also, as a side issue -- have the buffered commands always
> taken that much cpu vs. direct (machine has 2x1GHz-P-III's).
> Maybe it has and I just haven't noticed it -- but my main
> problem right now is with the horrible buffered SATA
> performance.
>
> Since SATA's use ATA-7 (or at least the Seagate disk I
> acquired seems to), shouldn't most of the hdparm commands
> be functional on the SATA hardware as much as they would
> be on PATA? Or...maybe said a different way, is there
> an "sdparm" that is to SATA what hdparm is to PATA?

It's the same libata code, so the same applies to some of the hdparm
commands not being implemented, as above.

>
> The Promise controllers involved (PATA and SATA) are:
> 00:0d.0 Mass storage controller: Promise Technology, Inc. PDC20268
> (Ultra100 TX2) (rev 02)
> and
> 02:09.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA
> 300 TX4) (rev 02)
>
> I'd ask about a newer driver, but the hardware seems pretty
> fast if I go around the Linux kernel. Ideas? What could
> slow down the linux-buffer layer when the driver is faster?
> Perversely, could it be the faster driver speed just tipping
> over some internal "flooding" limit which degrades buffered
> performance?
> Very Confused & TIA,
> Linda

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-01 00:19:38

by L A Walsh

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Robert Hancock wrote:
>> Have you tried using a different block size to see how that effects
>> the results? There might be some funny interaction there.
----
There is some interaction with the large block size (but only on the
SATA
disk). Counts were adjusted to keep the read near 2G (~2x physical memory).
From 1k-16k block sizes, I got into the low-mid 40MB/s on buffered SATA
(compared to 50-60MB/s on ATA & SCSI). Starting at 32k-64k, the read
rate began falling and at 128k block-reads-at-a-time or larger, it drops
below
20MB/s (again, only on buffered SATA). It's hard to imagine what would
slow down buffered SATA reads but not ATA and SCSI reads of the same
size. I'm using the 'cfq' scheduler with everything running at default
priorities, but again, why only SATA slowness? It seems that at the driver
level, using direct reads, the SATA disk has the highest read rate (near
80MB/s).

It would certainly be perverse to have faster driver & device
performance
equate to lower buffered I/O.
>
>> I wanted to use the newer pata support in the SATA lib, but
>> got frustrated "real fast" by the lack of disk-parameter support
>> in the new pata library (hdparm is mostly broken; and the SCSI
>> utils aren't really intended for ATA(or SATA?) disks using the
>> SCSI interface.
>
> It's somewhat intentional that some of the hdparm commands (like for
> settting transfer modes, enable/disable DMA, etc.) don't work with
> libata. Most of them aren't necessary at all as correct DMA settings,
> etc. should always be set automatically (if not, please report as a bug).
---
The only way I could tell before was using hdparm to read the
parameters.
Since that doesn't work, it's hard to tell if they are set correctly,
but given
the high performance at the device driver level, I'm guessing the params
are set correctly.

>
>> Since SATA's use ATA-7 (or at least the Seagate disk I
>> acquired seems to), shouldn't most of the hdparm commands
>> be functional on the SATA hardware as much as they would
>> be on PATA? Or...maybe said a different way, is there
>> an "sdparm" that is to SATA what hdparm is to PATA?
>
> It's the same libata code, so the same applies to some of the hdparm
> commands not being implemented, as above.
---
Hmm... might be nice as an "RFE" to at least have the 'read-status'
commands work to see what the params are set to.

More importantly, how does one set parameters for acoustic and power
saving parameters? Some of my disks are used as 'backup' devices for my
other computers. With the ATA disks, they were kept "spun down" when not
being used (only used, 'normally', in early AM hours).

Another new "problem" (not as important) -- even though SATA disks are
called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.
Devices hda-hdd are not populated in my dev directory on bootup. Of course
this throws off boot-scripts that set diskparams by "hd<name>" and not
by label (using hdparm). Seems like the SATA disks are suffering a partial
identity problem -- seeming to reserve hda-hdd, but using the "sd" disk
names.
Is that a known problem? If not, I'll add it to my queue for bug-filing...

thanks,
Linda

2008-01-01 00:32:56

by Robert Hancock

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Linda Walsh wrote:
> Robert Hancock wrote:
>>> Have you tried using a different block size to see how that effects
>>> the results? There might be some funny interaction there.
> ----
> There is some interaction with the large block size (but only on the
> SATA
> disk). Counts were adjusted to keep the read near 2G (~2x physical
> memory).
> From 1k-16k block sizes, I got into the low-mid 40MB/s on buffered SATA
> (compared to 50-60MB/s on ATA & SCSI). Starting at 32k-64k, the read
> rate began falling and at 128k block-reads-at-a-time or larger, it drops
> below
> 20MB/s (again, only on buffered SATA). It's hard to imagine what would
> slow down buffered SATA reads but not ATA and SCSI reads of the same
> size. I'm using the 'cfq' scheduler with everything running at default
> priorities, but again, why only SATA slowness? It seems that at the driver
> level, using direct reads, the SATA disk has the highest read rate (near
> 80MB/s).
> It would certainly be perverse to have faster driver & device
> performance
> equate to lower buffered I/O.

Not too sure on that one. I suspect one might have to trace the actual
requests being received at the driver level somehow with buffered reads
in order to diagnose what's going on there..

>>
>>> I wanted to use the newer pata support in the SATA lib, but
>>> got frustrated "real fast" by the lack of disk-parameter support
>>> in the new pata library (hdparm is mostly broken; and the SCSI
>>> utils aren't really intended for ATA(or SATA?) disks using the
>>> SCSI interface.
>>
>> It's somewhat intentional that some of the hdparm commands (like for
>> settting transfer modes, enable/disable DMA, etc.) don't work with
>> libata. Most of them aren't necessary at all as correct DMA settings,
>> etc. should always be set automatically (if not, please report as a bug).
> ---
> The only way I could tell before was using hdparm to read the
> parameters.
> Since that doesn't work, it's hard to tell if they are set correctly,
> but given
> the high performance at the device driver level, I'm guessing the params
> are set correctly.

The settings in use should be reported in dmesg.

>
>>
>>> Since SATA's use ATA-7 (or at least the Seagate disk I
>>> acquired seems to), shouldn't most of the hdparm commands
>>> be functional on the SATA hardware as much as they would
>>> be on PATA? Or...maybe said a different way, is there
>>> an "sdparm" that is to SATA what hdparm is to PATA?
>>
>> It's the same libata code, so the same applies to some of the hdparm
>> commands not being implemented, as above.
> ---
> Hmm... might be nice as an "RFE" to at least have the 'read-status'
> commands work to see what the params are set to.
> More importantly, how does one set parameters for acoustic and power
> saving parameters? Some of my disks are used as 'backup' devices for my
> other computers. With the ATA disks, they were kept "spun down" when not
> being used (only used, 'normally', in early AM hours).

I believe those hdparm commands for power-save and AAM are supposed to
work (they just issue an ATA command to the disk). The ones that aren't
implemented are the ones that actually commanded the IDE layer, like DMA
on/off.

>
> Another new "problem" (not as important) -- even though SATA disks are
> called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.
> Devices hda-hdd are not populated in my dev directory on bootup. Of course
> this throws off boot-scripts that set diskparams by "hd<name>" and not
> by label (using hdparm). Seems like the SATA disks are suffering a partial
> identity problem -- seeming to reserve hda-hdd, but using the "sd" disk
> names.
> Is that a known problem? If not, I'll add it to my queue for bug-filing...

Could be a udev problem, as it's what does the device naming..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-01 02:14:48

by Alan

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

> rate began falling and at 128k block-reads-at-a-time or larger, it drops
> below
> 20MB/s (again, only on buffered SATA). It's hard to imagine what would
> slow down buffered SATA reads but not ATA and SCSI reads of the same
> size. I'm using the 'cfq' scheduler with everything running at default

Try disabling NCQ - see if you've got a drive with the 'NCQ = no
readahead' flaw.

> priorities, but again, why only SATA slowness? It seems that at the driver
> level, using direct reads, the SATA disk has the highest read rate (near
> 80MB/s).

Beats me - something is wrong that your setup triggers - could be
firmware funnies or Linux ones.

> The only way I could tell before was using hdparm to read the
> parameters.
> Since that doesn't work, it's hard to tell if they are set correctly,
> but given

hdparm supports identify to read modes on drives with libata. The one
thing you cannot do is force modes right now.

> More importantly, how does one set parameters for acoustic and power
> saving parameters? Some of my disks are used as 'backup' devices for my

hdparm or blktool

> other computers. With the ATA disks, they were kept "spun down" when not
> being used (only used, 'normally', in early AM hours).

Well for backup devices you can use the fact SATA is hot/warm plug.

> Another new "problem" (not as important) -- even though SATA disks are
> called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.

NOTABUG - your BIOS has decided to move them from the legacy addresses so
they move from hda-d to e-g.

2008-01-01 16:06:21

by Mark Lord

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

>>>> I wanted to use the newer pata support in the SATA lib, but
>>>> got frustrated "real fast" by the lack of disk-parameter support
>>>> in the new pata library (hdparm is mostly broken; and the SCSI
>>>> utils aren't really intended for ATA(or SATA?) disks using the
>>>> SCSI interface.
...

Most hdparm flags work perfectly fine with libata,
unless perhaps you're using Fedora, which for some odd
reason was using a 2+ year old copy of hdparm until
very very recently.

As others noted, the only things not working are things that
libata itself chooses not to allow from userspace because libata
has better low-level drivers that can set those things automatically
in a more reliable fashion than we ever could with drivers/ide:
DMA, 32-bit I/O, PIO/DMA xfer rates, hotplug stuff.

The rest, including acoustic and power-saving parameters,
work just fine with libata.

Cheers

2008-01-02 20:12:28

by L A Walsh

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Alan Cox wrote:
>> rate began falling; at 128k block-reads-at-a-time or larger, it drops below
>> 20MB/s (only on buffered SATA).
>
> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
> readahead' flaw.
---
I'm not aware, off hand, how to disable NCQ. I haven't had any
NCQ- or SATA- capable disks before a few weeks ago.

>> The only way I could tell before was using hdparm to read the
>> parameters. Since that doesn't work, it's hard to tell if they
>> are set correctly...
>
> hdparm supports identify to read modes on drives with libata. The one
> thing you cannot do is force modes right now.
>
>> More importantly, how does one set parameters for acoustic and power
>> saving parameters? Some of my disks are used as 'backup' devices for my
>
> hdparm or blktool
----

I have hdparm-v7.7.
There are some areas where it shows information, but areas where it
does not work jump out and lead me to suspect whether or not areas
that don't give explicit "ERROR" messages are presenting valid info.

Problem areas (using hdparm, disk=Seagate Barracuda 16MB cache, model=
ST3750640AS):
1) The drives current 'multicount' setting isn't readable or settable.
param "-i" shows "?16?" (with question marks around 16) and "-I" simply
shows "?" for the current setting. Attempting to <read|set> it:
"HDIO_<GET|SET>_MULTCOUNT failed: Inappropriate ioctl for device"
2) Drive Advanced Power Management setting("-B") (write-only):
"HDIO_DRIVE_CMD failed: Input/output error"
3) Drive Acoustic ("-M"), read = " acoustic = not supported",
write = " HDIO_DRIVE_CMD:ACOUSTIC failed: Input/output error"
Note: drive detailed info from "-I" says:
"Recommended acoustic management value: 254, current value: 0"
(i.e. - there seems to be no way to set recommended value)
4) 32-bit IO setting ("-c") (don't know if this important given the disk's
raw-read speed, it may be meaningless for SATA)
"IO_support = 0 (default 16-bit)"*
*
FWIW -- the spindown/standby timeout ("S") does seem to work.

>> other computers. With the ATA disks, they were kept "spun down" when not
>> being used (only used, 'normally', in early AM hours).
>
> Well for backup devices you can use the fact SATA is hot/warm plug.
---
I don't follow. It is an internal drive. Are their software "logically
unplug" commands that automatically re-"plug-in" the drive on access
and spin it back up like the spindown/standby timeout does? Or were
you referring to SATA's general hot/warm plug ability (if my hardware
setup, drive-slots, etc, permitted removability)?
>
>> Another new "problem" (not as important) -- even though SATA disks are
>> called with "sdX", my ATA disks that *were* at hda-hdc are now at hde-hdg.
>
> NOTABUG - your BIOS has decided to move them from the legacy addresses so
> they move from hda-d to e-g.
Sorry for my unclear usage -- by "problem" I meant that it was(is) an
"unexpected behavior". I'm sure the kernel is following the BIOS's
directions, I'm just not sure why a (supposedly), SATA-only card would
cause my BIOS to reserve 4 "[P]ATA-drives" after installing the
card. It may be symptomatic of some "cost-cutting" measure by the
card manufacurer. I just don't know why it's happening right now.

*however* -- it is "annoying" -- if the kernel reserves hda-hdd at the
request of the BIOS, it _might_ be useful if "udev" also populated
/dev/ with devices for hda-hdd. I.e. -- "something" on the linux-kernel
software-side of things knows that hda-hdd aren't really their as
the devices are not created in the udev-managed filesystem upon boot.

It may not be a kernel-bug insomuch as the kernel is intended to work,
but it doesn't seem that it is a "valuable feature". My reasoning:
"Hd" drive letters are "unstable" because plugging/unplugging HD
controllers and/or drives can change the HD lettering. Consequently,
it is considered "best practice" to mount disks by label instead of
by drive letter under linux. If it is acknowledged that the drive
letters are not stable, then why not have udev assign "hd" letters
only to drives that 'exist'?

Conversely, if udev had 'reserved' (created) hda-hdd devices because
the BIOS said they were 'reserved', then I can see it might have some
usefulness. This may be a 'udev'-specific concern or configuration
issue as well. I ran into it as I was going to try to use LILO's
"drive=" and "bios=" params to move the disks back to start with 'hda',
but lilo refused, saying 'hda' didn't exist (which it doesn't, as
indicated in the /dev-mounted udev 'filesystem'). It's not something
impossible to workaround or fix, just seemed odd to move the working
drives up to hde-g, when they could have been mapped to hda-c with
no apparent conflicts.

I know, it's a subtlety, but one not inconsistent with (wincing at
the admission of even knowing this, let along the comparison) WinXP's
feature set.

If a disk was mounted and associated with a specific letter, then
later another controller or disk is added that would cause 're-lettering'
under linux, it won't necessarily cause re-lettering under WinXP (as
it used to under Win98). This 'threw me' the first time it happened,
as I expected Win to 're-letter' my drives and it didn't. It seems
to associate drive-UUID's with the last letter they were mounted at
(or tries to, barring conflicts).

2008-01-03 00:27:03

by Robert Hancock

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Linda Walsh wrote:
> Alan Cox wrote:
>>> rate began falling; at 128k block-reads-at-a-time or larger, it drops
>>> below
>>> 20MB/s (only on buffered SATA).
>>
>> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
>> readahead' flaw.
> ---
> I'm not aware, off hand, how to disable NCQ. I haven't had any
> NCQ- or SATA- capable disks before a few weeks ago.

See here:

http://linux-ata.org/faq.html#ncq

>
>>> The only way I could tell before was using hdparm to read the
>>> parameters. Since that doesn't work, it's hard to tell if they
>>> are set correctly...
>>
>> hdparm supports identify to read modes on drives with libata. The one
>> thing you cannot do is force modes right now.
>>
>>> More importantly, how does one set parameters for acoustic and power
>>> saving parameters? Some of my disks are used as 'backup' devices for my
>>
>> hdparm or blktool
> ----
>
> I have hdparm-v7.7. There are some areas where it shows information,
> but areas where it
> does not work jump out and lead me to suspect whether or not areas
> that don't give explicit "ERROR" messages are presenting valid info.
>
> Problem areas (using hdparm, disk=Seagate Barracuda 16MB cache, model=
> ST3750640AS):
> 1) The drives current 'multicount' setting isn't readable or settable.
> param "-i" shows "?16?" (with question marks around 16) and "-I" simply
> shows "?" for the current setting. Attempting to <read|set> it:
> "HDIO_<GET|SET>_MULTCOUNT failed: Inappropriate ioctl for device"

I don't think you can get or get the multi count currently, it just uses
the best supported value.

> 2) Drive Advanced Power Management setting("-B") (write-only):
> "HDIO_DRIVE_CMD failed: Input/output error"
> 3) Drive Acoustic ("-M"), read = " acoustic = not supported",
> write = " HDIO_DRIVE_CMD:ACOUSTIC failed: Input/output error"
> Note: drive detailed info from "-I" says:
> "Recommended acoustic management value: 254, current value: 0"
> (i.e. - there seems to be no way to set recommended value)

Not sure about these ones.. Does anything show up in dmesg when you do this?

> 4) 32-bit IO setting ("-c") (don't know if this important given the disk's
> raw-read speed, it may be meaningless for SATA)
> "IO_support = 0 (default 16-bit)"*
> *

This setting is not meaningful for anything using DMA.

> FWIW -- the spindown/standby timeout ("S") does seem to work.
>
>>> other computers. With the ATA disks, they were kept "spun down" when
>>> not
>>> being used (only used, 'normally', in early AM hours).
>>
>> Well for backup devices you can use the fact SATA is hot/warm plug.
> ---
> I don't follow. It is an internal drive. Are their software "logically
> unplug" commands that automatically re-"plug-in" the drive on access
> and spin it back up like the spindown/standby timeout does? Or were
> you referring to SATA's general hot/warm plug ability (if my hardware
> setup, drive-slots, etc, permitted removability)?

I think they were referring to physically hotplugging the drive. This is
more practical if you have a removable drive caddy, or if the drive is
hooked up through eSATA.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2008-01-03 04:25:47

by L A Walsh

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Robert Hancock wrote:
> Linda Walsh wrote:
>> Alan Cox wrote:
>>>> rate began falling; at 128k block-reads-at-a-time or larger, it
>>>> drops below
>>>> 20MB/s (only on buffered SATA).
>>> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
>>> readahead' flaw.
> http://linux-ata.org/faq.html#ncq
---
When drive initializes, dmesg says it has NCQ (depth 0/32)
Reading the queue_depth under /sys, shows a queuedepth of "1".

But more importantly -- I notice a chronic error message associate
with this drive that may be causing some or all of the problem:
---
Jan 2 20:06:10 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2
Jan 2 20:06:10 Ishtar kernel: ata1.00: port_status 0x20080000
Jan 2 20:06:10 Ishtar kernel: ata1.00: cmd
c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
Jan 2 20:06:10 Ishtar kernel: res
50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
Jan 2 20:06:13 Ishtar kernel: ata1: limiting SATA link speed to 1.5 Gbps
Jan 2 20:06:13 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6
Jan 2 20:06:13 Ishtar kernel: ata1.00: port_status 0x20080000
Jan 2 20:06:13 Ishtar kernel: ata1.00: cmd
c8/00:10:00:8b:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
Jan 2 20:06:13 Ishtar kernel: res
50/00:00:0f:8b:04/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
Jan 2 20:06:14 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr
0x0 action 0x3
Jan 2 20:06:14 Ishtar kernel: ata1: hotplug_status 0x80
Jan 2 20:06:15 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr
0x0 action 0x3
Jan 2 20:06:15 Ishtar kernel: ata1: hotplug_status 0x80
---
What da heck? Note, this is with NCQ-queuing set to "1". Only
reference I could find for this error referred to "older drives", but
this is a
2007-model year drive with ATA-7 and udma-6.

> I don't think you can get or get the multi count currently, it just
> uses the best supported value.
ok
>
>> 2) Drive Advanced Power Management setting("-B") (write-only):
>> "HDIO_DRIVE_CMD failed: Input/output error"
>> 3) Drive Acoustic ("-M"), read = " acoustic = not supported",
>> write = " HDIO_DRIVE_CMD:ACOUSTIC failed: Input/output error"
>
> Not sure about these ones.. Does anything show up in dmesg when you do
> this?
---
Yes:
(for "-B", power-management)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: port_status 0x20200000
ata1.00: cmd ef/05:fe:00:00:00/00:00:00:00:00/40 tag 0 cdb 0x0 data 0
res 51/04:fe:00:00:00/00:00:00:00:00/40 Emask 0x1 (device error)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 1:0:0:0: [sdb] 1465149168 512-byte hardware sectors (750156 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
----
(for "-M" acoustic management):
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: port_status 0x20200000
ata1.00: cmd ef/42:fe:00:00:00/00:00:00:00:00/40 tag 0 cdb 0x0 data 0
res 51/04:fe:00:00:00/00:00:00:00:00/40 Emask 0x1 (device error)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 1:0:0:0: [sdb] 1465149168 512-byte hardware sectors (750156 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

2008-01-03 08:45:40

by Mikael Pettersson

[permalink] [raw]

Subject: Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

Linda Walsh writes:
> Robert Hancock wrote:
> > Linda Walsh wrote:
> >> Alan Cox wrote:
> >>>> rate began falling; at 128k block-reads-at-a-time or larger, it
> >>>> drops below
> >>>> 20MB/s (only on buffered SATA).
> >>> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
> >>> readahead' flaw.
> > http://linux-ata.org/faq.html#ncq
> ---
> When drive initializes, dmesg says it has NCQ (depth 0/32)
> Reading the queue_depth under /sys, shows a queuedepth of "1".
>
> But more importantly -- I notice a chronic error message associate
> with this drive that may be causing some or all of the problem:
> ---
> Jan 2 20:06:10 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2
> Jan 2 20:06:10 Ishtar kernel: ata1.00: port_status 0x20080000
> Jan 2 20:06:10 Ishtar kernel: ata1.00: cmd
> c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
> Jan 2 20:06:10 Ishtar kernel: res
> 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> Jan 2 20:06:13 Ishtar kernel: ata1: limiting SATA link speed to 1.5 Gbps
> Jan 2 20:06:13 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x6
> Jan 2 20:06:13 Ishtar kernel: ata1.00: port_status 0x20080000
> Jan 2 20:06:13 Ishtar kernel: ata1.00: cmd
> c8/00:10:00:8b:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
> Jan 2 20:06:13 Ishtar kernel: res
> 50/00:00:0f:8b:04/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> Jan 2 20:06:14 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr
> 0x0 action 0x3
> Jan 2 20:06:14 Ishtar kernel: ata1: hotplug_status 0x80
> Jan 2 20:06:15 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr
> 0x0 action 0x3
> Jan 2 20:06:15 Ishtar kernel: ata1: hotplug_status 0x80
> ---
> What da heck?

Looks like the Promise ASIC SG bug. Apply
<http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
and let us know if things improve.

/Mikael

2008-01-03 20:22:44

by Chuck Ebbert

[permalink] [raw]

Subject: Re: SATA buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

On 12/30/2007 12:06 AM, Linda Walsh wrote:
> I needed to get a new hard disk for one of my systems and thought that
> it was about time to start going with SATA.
>
> I picked up a Promise 4-Port Sata300-TX4 to go with a 750G
> Seagate SATA -- I'd had good luck with a Promise ATA100 (P)ATA
> and lower capacity Seagates and thought it would be a good combo.
>
> Unfortunately, the *buffered* read performance is *horrible*!
>
> I timed the new disk against a 400GB PATA and old 80MB/s SCSI-based
> 18.3G hard disk. While the raw speed numbers are faster as expected,
> the linux-buffered read numbers are not good.
>
>
> sda=18.3G on 80MB/s SCSI
> sdb=the new 750GB on a 3Gb SATA w/NCQ.
> hdf=400GB PATA on an ATA100 Promise card
>
> I used "dd" for my tests, reading 2GB on a quiescent machine
> that has 1GB of main memory. Output was to dev null. Input
> was from the device (not a partition or file), (/dev/sda, /dev/sdb
> and /dev/hdf). BS=1M, Count=2k. For the direct tests, I used
> the "iflag=direct" param. No RAID or "volumes" are involved.
>
> In each case, I took best run time out of 3 runs.
>
> Direct read speeds (and cpu usage):
> dev speed cpu/real %
> sda 60MB/s 0.51/35.84 1.44
> sdb 80MB/s 0.50/26.72 1.87
> hdf 69.4MB/s 0.51/30.92 1.68
>
>
> Buffered reads show the "bad news":
> dev speed cpu/real %
> sda 59.9MB/s 20.80/35.86 58.03
> sdb 18.7MB/s 16.07/114.73 14.01 <-SATA extra badness
> hdf 69.8MB/s 17.37/30.76 56.48
>
> I assume this isn't expected behavior.
>

Try the PATA driver for the parallel ATA drive to see if it
has the same behavior.

Reboot before each test (or use drop_caches.)

hdparm should mostly work for reading drive settings but not for
writing them...

2008-01-04 02:38:11

by L A Walsh

[permalink] [raw]

Subject: Re:Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Mikael Pettersson wrote:
> Linda Walsh writes:
> > Robert Hancock wrote:
> > > Linda Walsh wrote:
> > >>>> read rate began falling; at 128k block-reads-at-a-time or larger, it
> > >>>> drops below 20MB/s (only on buffered SATA).
> >
> > But more importantly -- I notice a chronic error message associate
> > with this drive that may be causing some or all of the problem:
> > ---
> > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > ata1.00: port_status 0x20080000
> > ata1.00: cmd c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
> > res 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> > ata1: limiting SATA link speed to 1.5 Gbps
>
>
> Looks like the Promise ASIC SG bug. Apply
> <http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
> and let us know if things improve.
>
> /Mikael
>
---
Yep! Hope that's making it into a patch soon or, at least 2.6.24.
Kernel buffered

I seem to remember reading about some problems with Promise SATA & ACPI.
Does this address that or is that a separate issue? (Am using no-acpi for
now, but would like to try acpi again if it may be fixed (last time I tried
it with this card, "sdb" went "offline" (once it unmounted itself and
refused to be remounted (no error...just nothing), and another it stayed
mounted, but gave an I/O Error...so have been using no-acpi since).
An ACPI error in bootup said:
ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread EFFC2000 could
not acquire Mutex [3] [20070126]

Is the above bug mentioned/discussed in the linux-ide archives? That
and I'd like to find out why TCQ/NCQ doesn't work with the Seagate drives --
my guess, since they say queuedepth of 0/32, is that they are blacklisted
as being drives that don't follow normal protocol or implement their
own proprietary extensions? Sigh. Really a lame move (if that's the case)
for Seagate, considering they usage they could likely get in server
configs. Maybe they want to push their SCSI/SAS drives?

BTW, can SATA have DPO or FUA or are those limited to SCSI?
Would it be a desirable future addition to remove the
"doesn't support DPO or FUA" error message" on SATA drives if they are
specific to SCSI?

2008-01-04 02:50:20

by Robert Hancock

[permalink] [raw]

Subject: Re: Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Linda Walsh wrote:
> I seem to remember reading about some problems with Promise SATA & ACPI.
> Does this address that or is that a separate issue? (Am using no-acpi for
> now, but would like to try acpi again if it may be fixed (last time I tried
> it with this card, "sdb" went "offline" (once it unmounted itself and
> refused to be remounted (no error...just nothing), and another it stayed
> mounted, but gave an I/O Error...so have been using no-acpi since).
> An ACPI error in bootup said:
> ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread EFFC2000 could
> not acquire Mutex [3] [20070126]

Have you tried 2.6.24-rc6? If the problem still occurs there, you should
post the full bootup log.

>
> Is the above bug mentioned/discussed in the linux-ide archives? That
> and I'd like to find out why TCQ/NCQ doesn't work with the Seagate
> drives --
> my guess, since they say queuedepth of 0/32, is that they are blacklisted
> as being drives that don't follow normal protocol or implement their
> own proprietary extensions? Sigh. Really a lame move (if that's the case)
> for Seagate, considering they usage they could likely get in server
> configs. Maybe they want to push their SCSI/SAS drives?

Queue depth 0/32 means the drive supports a queue depth of 32 but the
controller/driver don't support NCQ.

> BTW, can SATA have DPO or FUA or are those limited to SCSI?
> Would it be a desirable future addition to remove the
> "doesn't support DPO or FUA" error message" on SATA drives if they are
> specific to SCSI?

ATA disks can have FUA support, but the support is disabled in libata by
default. (There's a fua parameter on libata module to enable it I believe.)

2008-01-04 11:31:31

by Mikael Pettersson

[permalink] [raw]

Subject: Re: Re:Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Linda Walsh writes:
> Mikael Pettersson wrote:
> > Linda Walsh writes:
> > > Robert Hancock wrote:
> > > > Linda Walsh wrote:
> > > >>>> read rate began falling; at 128k block-reads-at-a-time or larger, it
> > > >>>> drops below 20MB/s (only on buffered SATA).
> > >
> > > But more importantly -- I notice a chronic error message associate
> > > with this drive that may be causing some or all of the problem:
> > > ---
> > > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > > ata1.00: port_status 0x20080000
> > > ata1.00: cmd c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
> > > res 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> > > ata1: limiting SATA link speed to 1.5 Gbps
> >
> >
> > Looks like the Promise ASIC SG bug. Apply
> > <http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
> > and let us know if things improve.
> >
> > /Mikael
> >
> ---
> Yep! Hope that's making it into a patch soon or, at least 2.6.24.
> Kernel buffered

Good to hear that it solved this problem.
The patch is in 2.6.24-rc2 and newer kernels, and will be sent
to -stable for the 2.6.23 and 2.6.22 series.

> I seem to remember reading about some problems with Promise SATA & ACPI.
> Does this address that or is that a separate issue? (Am using no-acpi for

sata_promise does nothing ACPI-related. It doesn't need to.
(Drives may be a different story.)

> Is the above bug mentioned/discussed in the linux-ide archives?

Yes.

> That
> and I'd like to find out why TCQ/NCQ doesn't work with the Seagate drives --

The driver doesn't yet support NCQ.

2008-01-06 20:21:24

by L A Walsh

[permalink] [raw]

Subject: Re: Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Mikael Pettersson wrote:
> Linda Walsh writes:
> > Mikael Pettersson wrote:
> > > Linda Walsh writes:
> > > > > Linda Walsh wrote:
> > > > >>>> read rate began falling; (.25 - .3);
> > > > more importantly; a chronic error message associated
> > > > with drive may be causing some or all of the problem(s):
> > > > ---
> > > > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
> > > > ata1.00: port_status 0x20080000
> > > > ata1.00: cmd c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
> > > > res 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> > > > ata1: limiting SATA link speed to 1.5 Gbps
> > >
> > > Looks like the Promise ASIC SG bug. Apply
> > > <http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
> > > and let us know if things improve.
> > > /Mikael
> > ---
> > Yep! Hope that's making it into a patch soon or, at least 2.6.24.
> > Kernel buffered
> Good to hear that it solved this problem.
> The patch is in 2.6.24-rc2 and newer kernels, and will be sent
> to -stable for the 2.6.23 and 2.6.22 series.
>
---
Will 'likely' wait till -stable since I use the machine as a 'server'
for just about any/everything that needs "serving" or "proxy" services.
> > That and I'd like to find out why TCQ/NCQ doesn't work with the Seagate drives --
>
> The driver doesn't yet support NCQ.
----
Is 'main' diff between NCQ/TCQ that TCQ can re-arrange 'write'
priority under driver control, whereas NCQ is mostly a FIFO queue?

On a Journal'ed file system, isn't "write-order" control required
for integrity? That would seem to imply TCQ could be used, but
NCQ doesn't seem to offer much benefit, since the higher level
kernel drivers usually have a "larger picture" of sectors that need
to be written. The only advantage I can see for NCQ drives might
be that the kernel may not know the drive's internal physical
structure nor where the disk is in its current revolution. That could
allow drive write re-ordering where based on the exact-current state
of the drive that the kernel might not have access to, but it seems
this would be a minor benefit -- and, depending on firmware,
possibly higher overhead in command processing?

Am trying to differentiate NCQ/TCQ and SAS v. SCSI benefits.
It seems both support (SAS & SATA) some type of port-multiplier/
multiplexor/ option to allow more disks/port.

However, (please correct?) SATA uses a hub type architecture while
SAS uses a switch architecture. My experience with network hubs vs.
switches is that network hubs can be much slower if there is
communication contention. Is the word 'hub' being used in the
"shared-communication media sense", or is someone using the term
'hub' as a [sic] replacement for a 'switch'?

2008-01-09 02:30:38

by Tejun Heo

[permalink] [raw]

Subject: Re: Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

Linda Walsh wrote:
> Is 'main' diff between NCQ/TCQ that TCQ can re-arrange 'write'
> priority under driver control, whereas NCQ is mostly a FIFO queue?

No, NCQ can reorder although I recently heard that windows issues
overlapping NCQ commands and expects them to be processed in order (what
were they thinking?).

The biggest difference between TCQ and NCQ is that TCQ is for SCSI while
NCQ is for ATA. Functional difference includes more number of available
tags and ordered tags for TCQ. The former doesn't matter for single
disk. The latter may make some difference but on single disk not by much.

> Am trying to differentiate NCQ/TCQ and SAS v. SCSI benefits.
> It seems both support (SAS & SATA) some type of port-multiplier/
> multiplexor/ option to allow more disks/port.
>
> However, (please correct?) SATA uses a hub type architecture while
> SAS uses a switch architecture. My experience with network hubs vs.
> switches is that network hubs can be much slower if there is
> communication contention. Is the word 'hub' being used in the
> "shared-communication media sense", or is someone using the term
> 'hub' as a [sic] replacement for a 'switch'?

Port multiplier is a switch too. It doesn't broadcast anything and
definitely has forwarding buffers inside. An allegory which makes more
sense is expander to router and port multiplier to switch. Unless you
wanna nest them, they aren't that different.

--
tejun