2024-04-12 13:49:22

by Gustav Ekelund

[permalink] [raw]
Subject: [PATCH] ata: Add sdev attribute to lower link speed in runtime

Expose a new sysfs attribute to userspace that gives root the ability to
lower the link speed in a scsi_device at runtime. The handle enables
programs to, based on external circumstances that may be unbeknownst to
the kernel, determine if a link should slow down to perhaps achieve a
stabler signal. External circumstances could include the mission time
of the connected hardware or observations to temperature trends.

Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
first lower the link speed one step with sata_down_spd_limit and then
finish off with sata_link_hardreset.

Signed-off-by: Gustav Ekelund <[email protected]>
---
Documentation/ABI/testing/sysfs-block-device | 13 +++++++++
drivers/ata/libahci.c | 1 +
drivers/ata/libata-sata.c | 29 ++++++++++++++++++++
include/linux/libata.h | 1 +
4 files changed, 44 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block-device b/Documentation/ABI/testing/sysfs-block-device
index 2d543cfa4079..dba537fdf787 100644
--- a/Documentation/ABI/testing/sysfs-block-device
+++ b/Documentation/ABI/testing/sysfs-block-device
@@ -117,3 +117,16 @@ Description:
Writing "1" to this file enables the use of command duration
limits for read and write commands in the kernel and turns on
the feature on the device. Writing "0" disables the feature.
+
+
+What: /sys/block/*/device/down_link_spd
+Date: Mar, 2024
+KernelVersion: v6.10
+Contact: [email protected]
+Description:
+ (WO) Write 1 to the file to lower the SATA link speed one step
+ and then hard reset. Other values are rejected with -EINVAL.
+
+ Consider using libata.force for setting the link speed at boot.
+ Resort to down_link_spd in systems that deem it useful to lower
+ the link speed at runtime for particular links.
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 1a63200ea437..2622e965d98c 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -138,6 +138,7 @@ static struct attribute *ahci_sdev_attrs[] = {
&dev_attr_unload_heads.attr,
&dev_attr_ncq_prio_supported.attr,
&dev_attr_ncq_prio_enable.attr,
+ &dev_attr_down_link_spd.attr,
NULL
};

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 0fb1934875f2..5e1257b15f95 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1538,3 +1538,32 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
ehc->i.err_mask &= ~AC_ERR_DEV;
}
EXPORT_SYMBOL_GPL(ata_eh_analyze_ncq_error);
+
+static ssize_t down_link_spd_store(struct device *device,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct scsi_device *sdev;
+ struct ata_port *ap;
+ long input;
+ int rc;
+
+ rc = kstrtol(buf, 10, &input);
+ if (rc)
+ return rc;
+ if ((input < 0) || (input > 1))
+ return -EINVAL;
+
+ sdev = to_scsi_device(device);
+ ap = ata_shost_to_port(sdev->host);
+
+ rc = sata_down_spd_limit(&ap->link, 0);
+ if (rc)
+ return rc;
+ rc = sata_link_hardreset(&ap->link, sata_deb_timing_normal,
+ ata_deadline(jiffies, ATA_TMOUT_INTERNAL_QUICK), NULL, NULL);
+
+ return rc ? rc : len;
+}
+DEVICE_ATTR_WO(down_link_spd);
+EXPORT_SYMBOL_GPL(dev_attr_down_link_spd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 26d68115afb8..51d23a60355b 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -510,6 +510,7 @@ extern struct device_attribute dev_attr_ncq_prio_enable;
extern struct device_attribute dev_attr_em_message_type;
extern struct device_attribute dev_attr_em_message;
extern struct device_attribute dev_attr_sw_activity;
+extern struct device_attribute dev_attr_down_link_spd;
#endif

enum sw_activity {
--
2.39.2



2024-04-12 17:00:10

by Niklas Cassel

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

Hello Gustav,

On Fri, Apr 12, 2024 at 03:48:38PM +0200, Gustav Ekelund wrote:
> Expose a new sysfs attribute to userspace that gives root the ability to
> lower the link speed in a scsi_device at runtime. The handle enables
> programs to, based on external circumstances that may be unbeknownst to
> the kernel, determine if a link should slow down to perhaps achieve a
> stabler signal. External circumstances could include the mission time
> of the connected hardware or observations to temperature trends.
>
> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
> first lower the link speed one step with sata_down_spd_limit and then
> finish off with sata_link_hardreset.
>
> Signed-off-by: Gustav Ekelund <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-block-device | 13 +++++++++
> drivers/ata/libahci.c | 1 +
> drivers/ata/libata-sata.c | 29 ++++++++++++++++++++
> include/linux/libata.h | 1 +
> 4 files changed, 44 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-block-device b/Documentation/ABI/testing/sysfs-block-device
> index 2d543cfa4079..dba537fdf787 100644
> --- a/Documentation/ABI/testing/sysfs-block-device
> +++ b/Documentation/ABI/testing/sysfs-block-device
> @@ -117,3 +117,16 @@ Description:
> Writing "1" to this file enables the use of command duration
> limits for read and write commands in the kernel and turns on
> the feature on the device. Writing "0" disables the feature.
> +
> +
> +What: /sys/block/*/device/down_link_spd
> +Date: Mar, 2024
> +KernelVersion: v6.10
> +Contact: [email protected]
> +Description:
> + (WO) Write 1 to the file to lower the SATA link speed one step
> + and then hard reset. Other values are rejected with -EINVAL.
> +
> + Consider using libata.force for setting the link speed at boot.
> + Resort to down_link_spd in systems that deem it useful to lower
> + the link speed at runtime for particular links.
> diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> index 1a63200ea437..2622e965d98c 100644
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -138,6 +138,7 @@ static struct attribute *ahci_sdev_attrs[] = {
> &dev_attr_unload_heads.attr,
> &dev_attr_ncq_prio_supported.attr,
> &dev_attr_ncq_prio_enable.attr,
> + &dev_attr_down_link_spd.attr,
> NULL
> };
>
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index 0fb1934875f2..5e1257b15f95 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -1538,3 +1538,32 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
> ehc->i.err_mask &= ~AC_ERR_DEV;
> }
> EXPORT_SYMBOL_GPL(ata_eh_analyze_ncq_error);
> +
> +static ssize_t down_link_spd_store(struct device *device,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct scsi_device *sdev;

In general, please compare your code with ata_ncq_prio_enable_store().

Please assign sdev directly:
struct scsi_device *sdev = to_scsi_device(device);


> + struct ata_port *ap;
> + long input;
> + int rc;
> +
> + rc = kstrtol(buf, 10, &input);
> + if (rc)
> + return rc;
> + if ((input < 0) || (input > 1))
> + return -EINVAL;
> +
> + sdev = to_scsi_device(device);
> + ap = ata_shost_to_port(sdev->host);

Please also check ata_scsi_find_dev() like ata_ncq_prio_enable_store() does.


> +
> + rc = sata_down_spd_limit(&ap->link, 0);

Damien,
the kdoc for sata_down_spd_limit() is just
"LOCKING: Inherited from caller"
Do we perhaps need to hold the ap->lock here?


> + if (rc)
> + return rc;
> + rc = sata_link_hardreset(&ap->link, sata_deb_timing_normal,
> + ata_deadline(jiffies, ATA_TMOUT_INTERNAL_QUICK), NULL, NULL);

You shouldn't call sata_link_hardreset() directly.

Instead you should do:

spin_lock_irqsave(ap->lock, flags);

ehi->action |= ATA_EH_RESET;
ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;

ata_port_schedule_eh(ap);

spin_unlock_irqrestore(ap->lock, flags);

See:
https://github.com/torvalds/linux/blob/v6.9-rc3/drivers/ata/libata-core.c#L5873-L5883
and
https://github.com/torvalds/linux/blob/v6.9-rc3/drivers/ata/libata-core.c#L5265-L5267

> +
> + return rc ? rc : len;

Wouldn't it be more helpful if you could set a target speed instead?
How else are you supposed to be able to go back to the faster speed.

Perhaps you could look how libata.force forces a speed, and reuse
that struct member, or possibly add a new struct member if needed,
to store the forced target speed.


And possible a _show() to read the current target speed.
Writing "0" to target speed could possibly clear the forced speed.


> +}
> +DEVICE_ATTR_WO(down_link_spd);
> +EXPORT_SYMBOL_GPL(dev_attr_down_link_spd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 26d68115afb8..51d23a60355b 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -510,6 +510,7 @@ extern struct device_attribute dev_attr_ncq_prio_enable;
> extern struct device_attribute dev_attr_em_message_type;
> extern struct device_attribute dev_attr_em_message;
> extern struct device_attribute dev_attr_sw_activity;
> +extern struct device_attribute dev_attr_down_link_spd;
> #endif
>
> enum sw_activity {
> --
> 2.39.2
>

2024-04-13 00:30:32

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On 4/12/24 22:48, Gustav Ekelund wrote:
> Expose a new sysfs attribute to userspace that gives root the ability to
> lower the link speed in a scsi_device at runtime. The handle enables
> programs to, based on external circumstances that may be unbeknownst to
> the kernel, determine if a link should slow down to perhaps achieve a
> stabler signal. External circumstances could include the mission time
> of the connected hardware or observations to temperature trends.

may, perhaps, could... This does not sound very deterministic. Do you have an
actual practical use case where this patch is useful and solve a real problem ?

Strictly speaking, if you are seeing link stability issues due to temperature or
other environmental factors (humidity, altitude), then either you are operating
your hardware (board and/or HDD) outside of their environmental specifications,
or you have some serious hardware issues (which can be a simple as a bad SATA
cable or an inappropriate power supply). In both cases, I do not think that this
patch will be of any help.

Furthermore, libata already lowers a link speed automatically at runtime if it
sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
to force a maximum link speed for a device/adapter, which can also be specified
as a libata module argument (libata.force).

> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
> first lower the link speed one step with sata_down_spd_limit and then
> finish off with sata_link_hardreset.

We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
for now. So if you can really justify this manual link speed tuning for an
actual use case (not a hypothetical one), then the way to go would be to make
that attribute RW and implement its store() method to lower the link speed at
runtime.

And by the way, looking at what that attribute says, I always get:
<unknown>

So it looks like there is an issue with it that went unnoticed (because no one
is using it...). This needs some fixing.

--
Damien Le Moal
Western Digital Research


2024-04-15 14:50:06

by Gustav Ekelund

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On 4/13/24 02:29, Damien Le Moal wrote:
> On 4/12/24 22:48, Gustav Ekelund wrote:
>> Expose a new sysfs attribute to userspace that gives root the ability to
>> lower the link speed in a scsi_device at runtime. The handle enables
>> programs to, based on external circumstances that may be unbeknownst to
>> the kernel, determine if a link should slow down to perhaps achieve a
>> stabler signal. External circumstances could include the mission time
>> of the connected hardware or observations to temperature trends.
>
> may, perhaps, could... This does not sound very deterministic. Do you have an
> actual practical use case where this patch is useful and solve a real problem ?
>
> Strictly speaking, if you are seeing link stability issues due to temperature or
> other environmental factors (humidity, altitude), then either you are operating
> your hardware (board and/or HDD) outside of their environmental specifications,
> or you have some serious hardware issues (which can be a simple as a bad SATA
> cable or an inappropriate power supply). In both cases, I do not think that this
> patch will be of any help.
>
> Furthermore, libata already lowers a link speed automatically at runtime if it
> sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
> to force a maximum link speed for a device/adapter, which can also be specified
> as a libata module argument (libata.force).
>
>> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
>> first lower the link speed one step with sata_down_spd_limit and then
>> finish off with sata_link_hardreset.
>
> We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
> for now. So if you can really justify this manual link speed tuning for an
> actual use case (not a hypothetical one), then the way to go would be to make
> that attribute RW and implement its store() method to lower the link speed at
> runtime.
>
> And by the way, looking at what that attribute says, I always get:
> <unknown>
>
> So it looks like there is an issue with it that went unnoticed (because no one
> is using it...). This needs some fixing.
>
Hello Damien and Niklas,

Thank you for the feedback.

I have a hotplug system, where the links behave differently depending
on the disk model connected. For some models the kernel emits a lot of
bus errors, but mostly not enough errors for it to automatically lower
the link speed, except during high workloads. I have not observed any
data-loss regarding the errors, but the excessive logging becomes a problem.

So I want to adapt the link, depending on the connected model, in a
running system because I know that some particular models in this case
will operate better in SATA2 in this system.

Can I use the libata.force module to make changes to a particular link
in runtime?

Best regards
Gustav

2024-04-16 22:59:40

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On 2024/04/16 0:49, Gustav Ekelund wrote:
> On 4/13/24 02:29, Damien Le Moal wrote:
>> On 4/12/24 22:48, Gustav Ekelund wrote:
>>> Expose a new sysfs attribute to userspace that gives root the ability to
>>> lower the link speed in a scsi_device at runtime. The handle enables
>>> programs to, based on external circumstances that may be unbeknownst to
>>> the kernel, determine if a link should slow down to perhaps achieve a
>>> stabler signal. External circumstances could include the mission time
>>> of the connected hardware or observations to temperature trends.
>>
>> may, perhaps, could... This does not sound very deterministic. Do you have an
>> actual practical use case where this patch is useful and solve a real problem ?
>>
>> Strictly speaking, if you are seeing link stability issues due to temperature or
>> other environmental factors (humidity, altitude), then either you are operating
>> your hardware (board and/or HDD) outside of their environmental specifications,
>> or you have some serious hardware issues (which can be a simple as a bad SATA
>> cable or an inappropriate power supply). In both cases, I do not think that this
>> patch will be of any help.
>>
>> Furthermore, libata already lowers a link speed automatically at runtime if it
>> sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
>> to force a maximum link speed for a device/adapter, which can also be specified
>> as a libata module argument (libata.force).
>>
>>> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
>>> first lower the link speed one step with sata_down_spd_limit and then
>>> finish off with sata_link_hardreset.
>>
>> We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
>> for now. So if you can really justify this manual link speed tuning for an
>> actual use case (not a hypothetical one), then the way to go would be to make
>> that attribute RW and implement its store() method to lower the link speed at
>> runtime.
>>
>> And by the way, looking at what that attribute says, I always get:
>> <unknown>
>>
>> So it looks like there is an issue with it that went unnoticed (because no one
>> is using it...). This needs some fixing.
>>
> Hello Damien and Niklas,
>
> Thank you for the feedback.
>
> I have a hotplug system, where the links behave differently depending
> on the disk model connected. For some models the kernel emits a lot of
> bus errors, but mostly not enough errors for it to automatically lower
> the link speed, except during high workloads. I have not observed any
> data-loss regarding the errors, but the excessive logging becomes a problem.

Hot-plugging should not be an issue in itself. When hot-plugged, the port
scanning process should detect the maximum link speed supported by your device
and use that speed for probing the device itself (IDENTIFY etc). If you see bus
errors, then you are either having hardware issues (e.g. a bad cable or power
supply) or some issues with your AHCI controller that may need patching.

Can you send examples of the errors you are seeing ? That needs to be
investigated first before going the (drastic) route of allowing to manually
lower link speed at run-time.

>
> So I want to adapt the link, depending on the connected model, in a
> running system because I know that some particular models in this case
> will operate better in SATA2 in this system.
>
> Can I use the libata.force module to make changes to a particular link
> in runtime?

Nope, libata.force is a module parameter so you can specify it as a kernel boot
parameter, or if you compile libata as a module when loading (modprobe) libata.
At run time, you need to rmmod+modprobe again libata, and so the ahci driver as
well (because of dependencies).

As I mentioned, if a run-time knob really is necessary (it should not be), using
the ata_link hw_sata_spd_limit would be a better approach. But again, that
really should not be necessary at all.

>
> Best regards
> Gustav
>

--
Damien Le Moal
Western Digital Research


2024-04-17 09:21:02

by Gustav Ekelund

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On 4/17/24 00:59, Damien Le Moal wrote:
> On 2024/04/16 0:49, Gustav Ekelund wrote:
>> On 4/13/24 02:29, Damien Le Moal wrote:
>>> On 4/12/24 22:48, Gustav Ekelund wrote:
>>>> Expose a new sysfs attribute to userspace that gives root the ability to
>>>> lower the link speed in a scsi_device at runtime. The handle enables
>>>> programs to, based on external circumstances that may be unbeknownst to
>>>> the kernel, determine if a link should slow down to perhaps achieve a
>>>> stabler signal. External circumstances could include the mission time
>>>> of the connected hardware or observations to temperature trends.
>>>
>>> may, perhaps, could... This does not sound very deterministic. Do you have an
>>> actual practical use case where this patch is useful and solve a real problem ?
>>>
>>> Strictly speaking, if you are seeing link stability issues due to temperature or
>>> other environmental factors (humidity, altitude), then either you are operating
>>> your hardware (board and/or HDD) outside of their environmental specifications,
>>> or you have some serious hardware issues (which can be a simple as a bad SATA
>>> cable or an inappropriate power supply). In both cases, I do not think that this
>>> patch will be of any help.
>>>
>>> Furthermore, libata already lowers a link speed automatically at runtime if it
>>> sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
>>> to force a maximum link speed for a device/adapter, which can also be specified
>>> as a libata module argument (libata.force).
>>>
>>>> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
>>>> first lower the link speed one step with sata_down_spd_limit and then
>>>> finish off with sata_link_hardreset.
>>>
>>> We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
>>> for now. So if you can really justify this manual link speed tuning for an
>>> actual use case (not a hypothetical one), then the way to go would be to make
>>> that attribute RW and implement its store() method to lower the link speed at
>>> runtime.
>>>
>>> And by the way, looking at what that attribute says, I always get:
>>> <unknown>
>>>
>>> So it looks like there is an issue with it that went unnoticed (because no one
>>> is using it...). This needs some fixing.
>>>
>> Hello Damien and Niklas,
>>
>> Thank you for the feedback.
>>
>> I have a hotplug system, where the links behave differently depending
>> on the disk model connected. For some models the kernel emits a lot of
>> bus errors, but mostly not enough errors for it to automatically lower
>> the link speed, except during high workloads. I have not observed any
>> data-loss regarding the errors, but the excessive logging becomes a problem.
>
> Hot-plugging should not be an issue in itself. When hot-plugged, the port
> scanning process should detect the maximum link speed supported by your device
> and use that speed for probing the device itself (IDENTIFY etc). If you see bus
> errors, then you are either having hardware issues (e.g. a bad cable or power
> supply) or some issues with your AHCI controller that may need patching.
>
> Can you send examples of the errors you are seeing ? That needs to be
> investigated first before going the (drastic) route of allowing to manually
> lower link speed at run-time.
>
>>
>> So I want to adapt the link, depending on the connected model, in a
>> running system because I know that some particular models in this case
>> will operate better in SATA2 in this system.
>>
>> Can I use the libata.force module to make changes to a particular link
>> in runtime?
>
> Nope, libata.force is a module parameter so you can specify it as a kernel boot
> parameter, or if you compile libata as a module when loading (modprobe) libata.
> At run time, you need to rmmod+modprobe again libata, and so the ahci driver as
> well (because of dependencies).
>
> As I mentioned, if a run-time knob really is necessary (it should not be), using
> the ata_link hw_sata_spd_limit would be a better approach. But again, that
> really should not be necessary at all.
>
>>
>> Best regards
>> Gustav
>>
>
Hi Damien,

Unfortunately doing this selectively per link from user space seems to
be my only way forward for this particular hardware issue. I understand
if you deem this to be too specific to fit into the upstream kernel.

I will investigate if running libata as a module might be a way around
this peculiar problem. If I need to refine my first approach, I will
attempt to modify the hw_sata_spd_limit to be rw.

Thank you Damien and Niklas.

Best regards
Gustav

2024-04-17 09:56:34

by Niklas Cassel

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On Wed, Apr 17, 2024 at 08:59:27AM +1000, Damien Le Moal wrote:
>
> Can you send examples of the errors you are seeing ? That needs to be
> investigated first before going the (drastic) route of allowing to manually
> lower link speed at run-time.

Gustav, is it possible for you to share the error messages that you are
seeing? Preferably a whole kernel boot.

Since you are talking hot plug, there is a bunch of libata hot-plug related
in v6.9.x (which turns off LPM if your external port is hotplug capable).

So it would be interesting to see if you still get these errors on v6.9-rc4
(we will see if you have LPM enabled), and if so, what errors you are seeing.

You could also try booting with: libata.force=nolpm on the kernel command line.
(This will explicitly set lpm-policy to MAX_POWER, which is different from
lpm-policy=0 (which is the default) - which means keep firmware settings.)


Kind regards,
Niklas

>
> >
> > So I want to adapt the link, depending on the connected model, in a
> > running system because I know that some particular models in this case
> > will operate better in SATA2 in this system.
> >
> > Can I use the libata.force module to make changes to a particular link
> > in runtime?
>
> Nope, libata.force is a module parameter so you can specify it as a kernel boot
> parameter, or if you compile libata as a module when loading (modprobe) libata.
> At run time, you need to rmmod+modprobe again libata, and so the ahci driver as
> well (because of dependencies).
>
> As I mentioned, if a run-time knob really is necessary (it should not be), using
> the ata_link hw_sata_spd_limit would be a better approach. But again, that
> really should not be necessary at all.
>
> >
> > Best regards
> > Gustav
> >
>
> --
> Damien Le Moal
> Western Digital Research
>

2024-04-17 10:14:42

by Niklas Cassel

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On Mon, Apr 15, 2024 at 04:49:46PM +0200, Gustav Ekelund wrote:
> On 4/13/24 02:29, Damien Le Moal wrote:
> > On 4/12/24 22:48, Gustav Ekelund wrote:
> >> Expose a new sysfs attribute to userspace that gives root the ability to
> >> lower the link speed in a scsi_device at runtime. The handle enables
> >> programs to, based on external circumstances that may be unbeknownst to
> >> the kernel, determine if a link should slow down to perhaps achieve a
> >> stabler signal. External circumstances could include the mission time
> >> of the connected hardware or observations to temperature trends.
> >
> > may, perhaps, could... This does not sound very deterministic. Do you have an
> > actual practical use case where this patch is useful and solve a real problem ?
> >
> > Strictly speaking, if you are seeing link stability issues due to temperature or
> > other environmental factors (humidity, altitude), then either you are operating
> > your hardware (board and/or HDD) outside of their environmental specifications,
> > or you have some serious hardware issues (which can be a simple as a bad SATA
> > cable or an inappropriate power supply). In both cases, I do not think that this
> > patch will be of any help.
> >
> > Furthermore, libata already lowers a link speed automatically at runtime if it
> > sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
> > to force a maximum link speed for a device/adapter, which can also be specified
> > as a libata module argument (libata.force).
> >
> >> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
> >> first lower the link speed one step with sata_down_spd_limit and then
> >> finish off with sata_link_hardreset.
> >
> > We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
> > for now. So if you can really justify this manual link speed tuning for an
> > actual use case (not a hypothetical one), then the way to go would be to make
> > that attribute RW and implement its store() method to lower the link speed at
> > runtime.
> >
> > And by the way, looking at what that attribute says, I always get:
> > <unknown>
> >
> > So it looks like there is an issue with it that went unnoticed (because no one
> > is using it...). This needs some fixing.
> >
> Hello Damien and Niklas,
>
> Thank you for the feedback.
>
> I have a hotplug system, where the links behave differently depending
> on the disk model connected. For some models the kernel emits a lot of
> bus errors, but mostly not enough errors for it to automatically lower
> the link speed, except during high workloads. I have not observed any
> data-loss regarding the errors, but the excessive logging becomes a problem.

It might be interesting to compare the output of:
$ hdparm -I

for a drive that you can hot plug insert without errors, against a drive
that gives you errors on hot plug insertion, to see if this can give you
a hint of why they behave differently.

(e.g. certain features, e.g. DevSleep, is only enabled if there is support
in the HBA, the port, and the drive.)


Kind regards,
Niklas

2024-04-18 12:49:51

by Gustav Ekelund

[permalink] [raw]
Subject: Re: [PATCH] ata: Add sdev attribute to lower link speed in runtime

On 4/17/24 12:14, Niklas Cassel wrote:
> On Mon, Apr 15, 2024 at 04:49:46PM +0200, Gustav Ekelund wrote:
>> On 4/13/24 02:29, Damien Le Moal wrote:
>>> On 4/12/24 22:48, Gustav Ekelund wrote:
>>>> Expose a new sysfs attribute to userspace that gives root the ability to
>>>> lower the link speed in a scsi_device at runtime. The handle enables
>>>> programs to, based on external circumstances that may be unbeknownst to
>>>> the kernel, determine if a link should slow down to perhaps achieve a
>>>> stabler signal. External circumstances could include the mission time
>>>> of the connected hardware or observations to temperature trends.
>>>
>>> may, perhaps, could... This does not sound very deterministic. Do you have an
>>> actual practical use case where this patch is useful and solve a real problem ?
>>>
>>> Strictly speaking, if you are seeing link stability issues due to temperature or
>>> other environmental factors (humidity, altitude), then either you are operating
>>> your hardware (board and/or HDD) outside of their environmental specifications,
>>> or you have some serious hardware issues (which can be a simple as a bad SATA
>>> cable or an inappropriate power supply). In both cases, I do not think that this
>>> patch will be of any help.
>>>
>>> Furthermore, libata already lowers a link speed automatically at runtime if it
>>> sees too many NCQ errors. Isn't that enough ? And we also have the horkage flags
>>> to force a maximum link speed for a device/adapter, which can also be specified
>>> as a libata module argument (libata.force).
>>>
>>>> Writing 1 to /sys/block/*/device/down_link_spd signals the kernel to
>>>> first lower the link speed one step with sata_down_spd_limit and then
>>>> finish off with sata_link_hardreset.
>>>
>>> We already have "/sys/class/ata_link/*/hw_sata_spd_limit", which is read-only
>>> for now. So if you can really justify this manual link speed tuning for an
>>> actual use case (not a hypothetical one), then the way to go would be to make
>>> that attribute RW and implement its store() method to lower the link speed at
>>> runtime.
>>>
>>> And by the way, looking at what that attribute says, I always get:
>>> <unknown>
>>>
>>> So it looks like there is an issue with it that went unnoticed (because no one
>>> is using it...). This needs some fixing.
>>>
>> Hello Damien and Niklas,
>>
>> Thank you for the feedback.
>>
>> I have a hotplug system, where the links behave differently depending
>> on the disk model connected. For some models the kernel emits a lot of
>> bus errors, but mostly not enough errors for it to automatically lower
>> the link speed, except during high workloads. I have not observed any
>> data-loss regarding the errors, but the excessive logging becomes a problem.
>
> It might be interesting to compare the output of:
> $ hdparm -I
>
> for a drive that you can hot plug insert without errors, against a drive
> that gives you errors on hot plug insertion, to see if this can give you
> a hint of why they behave differently.
>
> (e.g. certain features, e.g. DevSleep, is only enabled if there is support
> in the HBA, the port, and the drive.)
>
>
> Kind regards,
> Niklas
Hi Niklas,

I mostly tested on the 6.1 kernel, and it is a quite peculiar hardware
problem, so it isn't caused by anythin in the latest 6.9 rc.

Thank you for the advice of using hdparm, maybe I can diff preset of
features between the models. I will share any interesting findings I
come across.

Best regards
Gustav