2021-09-01 05:11:47

by Kate Hsuan

[permalink] [raw]
Subject: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

Many users are reporting that the Samsung 860 and 870 SSD are having
various issues when combined with AMD SATA controllers and only
completely disabling NCQ helps to avoid these issues.

Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
SATA adapter vendor will cause I/O performance degradation with well
behaved adapters. To limit the performance impact to AMD adapters,
introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
only for these adapters.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
Signed-off-by: Kate Hsuan <[email protected]>
---
Changes in v4:
* A function ata_dev_check_adapter() is added to check the vendor ID of
the adapter.
* ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
align with the naming convention.
* Commit messages were improved according to reviewer comments.

Changes in v3:
* ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
ATA_HORKAGE_NONCQ_ON_AMD.
* Codes were fixed to completely disable NCQ on AMD controller.

---
drivers/ata/libata-core.c | 32 ++++++++++++++++++++++++++++++--
include/linux/libata.h | 1 +
2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c861c93d1e84..49049cd713e4 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2186,6 +2186,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)
dev->flags &= ~ATA_DFLAG_NCQ_PRIO;
}

+static bool ata_dev_check_adapter(struct ata_device *dev,
+ unsigned short vendor_id)
+{
+ struct pci_dev *pcidev = NULL;
+ struct device *parent_dev = NULL;
+
+ for (parent_dev = dev->tdev.parent; parent_dev != NULL;
+ parent_dev = parent_dev->parent) {
+ if (dev_is_pci(parent_dev)) {
+ pcidev = to_pci_dev(parent_dev);
+ if (pcidev->vendor == vendor_id)
+ return true;
+ break;
+ }
+ }
+
+ return false;
+}
+
static int ata_dev_config_ncq(struct ata_device *dev,
char *desc, size_t desc_sz)
{
@@ -2204,6 +2223,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
snprintf(desc, desc_sz, "NCQ (not used)");
return 0;
}
+
+ if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
+ ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
+ snprintf(desc, desc_sz, "NCQ (not used)");
+ return 0;
+ }
+
if (ap->flags & ATA_FLAG_NCQ) {
hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
dev->flags |= ATA_DFLAG_NCQ;
@@ -3971,9 +3997,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
{ "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM, },
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NO_NCQ_ON_AMD, },
{ "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM, },
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NO_NCQ_ON_AMD, },
{ "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },

diff --git a/include/linux/libata.h b/include/linux/libata.h
index 860e63f5667b..cdc248a15763 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -426,6 +426,7 @@ enum {
ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
+ ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27), /* Disable NCQ on AMD chipset */

/* DMA mask for user DMA control: User visible values; DO NOT
renumber */
--
2.31.1


2021-09-01 06:24:58

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

On 2021/09/01 13:52, Kate Hsuan wrote:
> Many users are reporting that the Samsung 860 and 870 SSD are having
> various issues when combined with AMD SATA controllers and only
> completely disabling NCQ helps to avoid these issues.
>
> Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
> SATA adapter vendor will cause I/O performance degradation with well
> behaved adapters. To limit the performance impact to AMD adapters,
> introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
> only for these adapters.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
> Signed-off-by: Kate Hsuan <[email protected]>
> ---
> Changes in v4:
> * A function ata_dev_check_adapter() is added to check the vendor ID of
> the adapter.
> * ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
> align with the naming??convention.
> * Commit messages were improved according??to reviewer comments.
>
> Changes in v3:
> * ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
> ATA_HORKAGE_NONCQ_ON_AMD.
> * Codes were fixed to completely disable NCQ on AMD controller.
>
> ---
> drivers/ata/libata-core.c | 32 ++++++++++++++++++++++++++++++--
> include/linux/libata.h | 1 +
> 2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c861c93d1e84..49049cd713e4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2186,6 +2186,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)
> dev->flags &= ~ATA_DFLAG_NCQ_PRIO;
> }
>
> +static bool ata_dev_check_adapter(struct ata_device *dev,
> + unsigned short vendor_id)
> +{
> + struct pci_dev *pcidev = NULL;
> + struct device *parent_dev = NULL;
> +
> + for (parent_dev = dev->tdev.parent; parent_dev != NULL;
> + parent_dev = parent_dev->parent) {
> + if (dev_is_pci(parent_dev)) {
> + pcidev = to_pci_dev(parent_dev);
> + if (pcidev->vendor == vendor_id)
> + return true;
> + break;
> + }
> + }
> +
> + return false;
> +}
> +
> static int ata_dev_config_ncq(struct ata_device *dev,
> char *desc, size_t desc_sz)
> {
> @@ -2204,6 +2223,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
> snprintf(desc, desc_sz, "NCQ (not used)");
> return 0;
> }
> +
> + if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
> + ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
> + snprintf(desc, desc_sz, "NCQ (not used)");
> + return 0;
> + }
> +
> if (ap->flags & ATA_FLAG_NCQ) {
> hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
> dev->flags |= ATA_DFLAG_NCQ;
> @@ -3971,9 +3997,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
> { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
> { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
>
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 860e63f5667b..cdc248a15763 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -426,6 +426,7 @@ enum {
> ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
> ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
> ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
> + ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27), /* Disable NCQ on AMD chipset */
>
> /* DMA mask for user DMA control: User visible values; DO NOT
> renumber */
> --
> 2.31.1
>
>

Looks good to me.

Reviewed-by: Damien Le Moal <[email protected]>


--
Damien Le Moal
Western Digital Research

2021-09-01 07:40:49

by Tor Vic

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

(Sorry for not doing a proper reply)

Hello,
Noob here.
I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
it just works flawlessly on 5.13 and 5.14.
Are you sure that *every* 860/870 is concerned by this problem on
*every* AMD controller? Isn't this too restrictive?
Or am I simply missing something?

Thanks,
Tor

2021-09-01 08:57:55

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

Hi Tor,

On 9/1/21 9:37 AM, [email protected] wrote:
> (Sorry for not doing a proper reply)
>
> Hello,
> Noob here.
> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
> it just works flawlessly on 5.13 and 5.14.
> Are you sure that *every* 860/870 is concerned by this problem on
> *every* AMD controller?

I am pretty sure that every 860 / 870 EVO is affected,
I am not sure if the PRO is also affected.

As for *every* AMD controller, chances are that more recent
AMD controllers are fine.

We have been trying to resolve various issues with this combo
for a long time now, see:

https://bugzilla.kernel.org/show_bug.cgi?id=201693
https://bugzilla.kernel.org/show_bug.cgi?id=203475

> Isn't this too restrictive?
> Or am I simply missing something?

The problem is that when users are hit by this they end up with
a non functional system and even fs / data corruption. Where
as OTOH disabling NCQ leads to a (significant) performance
degradation but affected systems will still work fine.

So I believe that it is best to err on the safe side here
and accept the performance degradation as a trade-of for
fixing the fs / data corruption.

With that said, I do believe that we should allow re-enabling
ncq on this combo through libata.force on the kernel cmdline
by adding this extra bit to the patch:

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6136,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
{ "ncq", .horkage_off = ATA_HORKAGE_NONCQ },
{ "noncqtrim", .horkage_on = ATA_HORKAGE_NO_NCQ_TRIM },
{ "ncqtrim", .horkage_off = ATA_HORKAGE_NO_NCQ_TRIM },
+ { "noncqamd", .horkage_on = ATA_HORKAGE_NO_NCQ_ON_AMD },
+ { "ncqamd", .horkage_off = ATA_HORKAGE_NO_NCQ_ON_AMD },
{ "dump_id", .horkage_on = ATA_HORKAGE_DUMP_ID },
{ "pio0", .xfer_mask = 1 << (ATA_SHIFT_PIO + 0) },
{ "pio1", .xfer_mask = 1 << (ATA_SHIFT_PIO + 1) },

And I will also add a comment to both linked bugs to see if we can maybe
exclude the pro models from this quirk and if we can maybe narrow it down
to a subset of the AMD SATA controllers.

But that narrowing down is probably best done as a follow up fix, while just
going with this "err on the safe side" approach for now.

Regards,

Hans

2021-09-01 09:05:19

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

Hi Kate,

On 9/1/21 6:52 AM, Kate Hsuan wrote:
> Many users are reporting that the Samsung 860 and 870 SSD are having
> various issues when combined with AMD SATA controllers and only
> completely disabling NCQ helps to avoid these issues.
>
> Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
> SATA adapter vendor will cause I/O performance degradation with well
> behaved adapters. To limit the performance impact to AMD adapters,
> introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
> only for these adapters.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
> Signed-off-by: Kate Hsuan <[email protected]>
> ---
> Changes in v4:
> * A function ata_dev_check_adapter() is added to check the vendor ID of
> the adapter.
> * ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
> align with the naming convention.
> * Commit messages were improved according to reviewer comments.
>
> Changes in v3:
> * ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
> ATA_HORKAGE_NONCQ_ON_AMD.
> * Codes were fixed to completely disable NCQ on AMD controller.
>
> ---
> drivers/ata/libata-core.c | 32 ++++++++++++++++++++++++++++++--
> include/linux/libata.h | 1 +
> 2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c861c93d1e84..49049cd713e4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2186,6 +2186,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)
> dev->flags &= ~ATA_DFLAG_NCQ_PRIO;
> }
>
> +static bool ata_dev_check_adapter(struct ata_device *dev,
> + unsigned short vendor_id)
> +{
> + struct pci_dev *pcidev = NULL;
> + struct device *parent_dev = NULL;
> +
> + for (parent_dev = dev->tdev.parent; parent_dev != NULL;
> + parent_dev = parent_dev->parent) {
> + if (dev_is_pci(parent_dev)) {
> + pcidev = to_pci_dev(parent_dev);
> + if (pcidev->vendor == vendor_id)
> + return true;
> + break;
> + }
> + }
> +
> + return false;
> +}
> +
> static int ata_dev_config_ncq(struct ata_device *dev,
> char *desc, size_t desc_sz)
> {
> @@ -2204,6 +2223,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
> snprintf(desc, desc_sz, "NCQ (not used)");
> return 0;
> }
> +
> + if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
> + ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
> + snprintf(desc, desc_sz, "NCQ (not used)");
> + return 0;
> + }
> +
> if (ap->flags & ATA_FLAG_NCQ) {
> hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
> dev->flags |= ATA_DFLAG_NCQ;
> @@ -3971,9 +3997,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
> { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
> { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |

Something went wrong when you applied my pre-cursor patch to your tree
as base for this patch, you have spaces in front of and behind the
"NULL,", where there should be tabs. So this does not apply cleanly
on top of my patch.

I'll forward my patch to you as an attached .eml file. You should
"git am <file>.eml" that file on top of the latest linux-block/for-next
and then rebase your patch on top of that.

> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |

Idem for this line.

> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
>
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 860e63f5667b..cdc248a15763 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -426,6 +426,7 @@ enum {
> ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
> ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
> ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
> + ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27), /* Disable NCQ on AMD chipset */
>
> /* DMA mask for user DMA control: User visible values; DO NOT
> renumber */

As discussed elsewhere in this thread, you should allow setting/clearing
this flag from the libata.force kernel commandline option by adding the
following extra bit to the patch:

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index daa375c7e763..e2e900085f99 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6136,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
{ "ncq", .horkage_off = ATA_HORKAGE_NONCQ },
{ "noncqtrim", .horkage_on = ATA_HORKAGE_NO_NCQ_TRIM },
{ "ncqtrim", .horkage_off = ATA_HORKAGE_NO_NCQ_TRIM },
+ { "noncqamd", .horkage_on = ATA_HORKAGE_NO_NCQ_ON_AMD },
+ { "ncqamd", .horkage_off = ATA_HORKAGE_NO_NCQ_ON_AMD },
{ "dump_id", .horkage_on = ATA_HORKAGE_DUMP_ID },
{ "pio0", .xfer_mask = 1 << (ATA_SHIFT_PIO + 0) },
{ "pio1", .xfer_mask = 1 << (ATA_SHIFT_PIO + 1) },

Regards,

Hans

2021-09-01 17:58:59

by Tor Vic

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.



On 01.09.21 08:13, Damien Le Moal wrote:
> On 2021/09/01 16:38, [email protected] wrote:
>> (Sorry for not doing a proper reply)
>>
>> Hello,
>> Noob here.
>> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
>> it just works flawlessly on 5.13 and 5.14.
>> Are you sure that *every* 860/870 is concerned by this problem on
>> *every* AMD controller? Isn't this too restrictive?
>> Or am I simply missing something?
>
> Is you drive being initialized with NCQ enabled ?
>
> cat /sys/block/sdX/device/queue_depth ?

Samsung 860 Pro:
$ cat /sys/block/sda/device/queue_depth


32

Samsung 860 Evo:
$ cat /sys/block/sdb/device/queue_depth


32

>
>>
>> Thanks,
>> Tor
>>
>
>

2021-09-01 18:48:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

On Wed, Sep 01, 2021 at 12:52:20PM +0800, Kate Hsuan wrote:
> Many users are reporting that the Samsung 860 and 870 SSD are having
> various issues when combined with AMD SATA controllers and only
> completely disabling NCQ helps to avoid these issues.
>
> Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
> SATA adapter vendor will cause I/O performance degradation with well
> behaved adapters. To limit the performance impact to AMD adapters,
> introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
> only for these adapters.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
> Signed-off-by: Kate Hsuan <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2021-09-01 18:49:52

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

On 2021/09/01 16:38, [email protected] wrote:
> (Sorry for not doing a proper reply)
>
> Hello,
> Noob here.
> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
> it just works flawlessly on 5.13 and 5.14.
> Are you sure that *every* 860/870 is concerned by this problem on
> *every* AMD controller? Isn't this too restrictive?
> Or am I simply missing something?

Is you drive being initialized with NCQ enabled ?

cat /sys/block/sdX/device/queue_depth ?

>
> Thanks,
> Tor
>


--
Damien Le Moal
Western Digital Research

2021-09-01 18:51:29

by Kate Hsuan

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

Hi Hans,

On Wed, Sep 1, 2021 at 5:01 PM Hans de Goede <[email protected]> wrote:
>
> Hi Kate,
>
> On 9/1/21 6:52 AM, Kate Hsuan wrote:
> > Many users are reporting that the Samsung 860 and 870 SSD are having
> > various issues when combined with AMD SATA controllers and only
> > completely disabling NCQ helps to avoid these issues.
> >
> > Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
> > SATA adapter vendor will cause I/O performance degradation with well
> > behaved adapters. To limit the performance impact to AMD adapters,
> > introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
> > only for these adapters.
> >
> > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
> > Signed-off-by: Kate Hsuan <[email protected]>
> > ---
> > Changes in v4:
> > * A function ata_dev_check_adapter() is added to check the vendor ID of
> > the adapter.
> > * ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
> > align with the naming convention.
> > * Commit messages were improved according to reviewer comments.

Thanks, I'll fix this.

> >
> > Changes in v3:
> > * ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
> > ATA_HORKAGE_NONCQ_ON_AMD.
> > * Codes were fixed to completely disable NCQ on AMD controller.
> >
> > ---
> > drivers/ata/libata-core.c | 32 ++++++++++++++++++++++++++++++--
> > include/linux/libata.h | 1 +
> > 2 files changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > index c861c93d1e84..49049cd713e4 100644
> > --- a/drivers/ata/libata-core.c
> > +++ b/drivers/ata/libata-core.c
> > @@ -2186,6 +2186,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)
> > dev->flags &= ~ATA_DFLAG_NCQ_PRIO;
> > }
> >
> > +static bool ata_dev_check_adapter(struct ata_device *dev,
> > + unsigned short vendor_id)
> > +{
> > + struct pci_dev *pcidev = NULL;
> > + struct device *parent_dev = NULL;
> > +
> > + for (parent_dev = dev->tdev.parent; parent_dev != NULL;
> > + parent_dev = parent_dev->parent) {
> > + if (dev_is_pci(parent_dev)) {
> > + pcidev = to_pci_dev(parent_dev);
> > + if (pcidev->vendor == vendor_id)
> > + return true;
> > + break;
> > + }
> > + }
> > +
> > + return false;
> > +}
> > +
> > static int ata_dev_config_ncq(struct ata_device *dev,
> > char *desc, size_t desc_sz)
> > {
> > @@ -2204,6 +2223,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
> > snprintf(desc, desc_sz, "NCQ (not used)");
> > return 0;
> > }
> > +
> > + if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
> > + ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
> > + snprintf(desc, desc_sz, "NCQ (not used)");
> > + return 0;
> > + }
> > +
> > if (ap->flags & ATA_FLAG_NCQ) {
> > hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
> > dev->flags |= ATA_DFLAG_NCQ;
> > @@ -3971,9 +3997,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
> > { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> > ATA_HORKAGE_ZERO_AFTER_TRIM, },
> > { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
>
> Something went wrong when you applied my pre-cursor patch to your tree
> as base for this patch, you have spaces in front of and behind the
> "NULL,", where there should be tabs. So this does not apply cleanly
> on top of my patch.
>
> I'll forward my patch to you as an attached .eml file. You should
> "git am <file>.eml" that file on top of the latest linux-block/for-next
> and then rebase your patch on top of that.
>
> > - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> > + ATA_HORKAGE_ZERO_AFTER_TRIM |
> > + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> > { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
>
> Idem for this line.

Got it.

>
> > - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> > + ATA_HORKAGE_ZERO_AFTER_TRIM |
> > + ATA_HORKAGE_NO_NCQ_ON_AMD, },
> > { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> > ATA_HORKAGE_ZERO_AFTER_TRIM, },
> >
> > diff --git a/include/linux/libata.h b/include/linux/libata.h
> > index 860e63f5667b..cdc248a15763 100644
> > --- a/include/linux/libata.h
> > +++ b/include/linux/libata.h
> > @@ -426,6 +426,7 @@ enum {
> > ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
> > ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
> > ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
> > + ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27), /* Disable NCQ on AMD chipset */
> >
> > /* DMA mask for user DMA control: User visible values; DO NOT
> > renumber */
>
> As discussed elsewhere in this thread, you should allow setting/clearing
> this flag from the libata.force kernel commandline option by adding the
> following extra bit to the patch:
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index daa375c7e763..e2e900085f99 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -6136,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
> { "ncq", .horkage_off = ATA_HORKAGE_NONCQ },
> { "noncqtrim", .horkage_on = ATA_HORKAGE_NO_NCQ_TRIM },
> { "ncqtrim", .horkage_off = ATA_HORKAGE_NO_NCQ_TRIM },
> + { "noncqamd", .horkage_on = ATA_HORKAGE_NO_NCQ_ON_AMD },
> + { "ncqamd", .horkage_off = ATA_HORKAGE_NO_NCQ_ON_AMD },

I'll add them and propose v5 patch.

> { "dump_id", .horkage_on = ATA_HORKAGE_DUMP_ID },
> { "pio0", .xfer_mask = 1 << (ATA_SHIFT_PIO + 0) },
> { "pio1", .xfer_mask = 1 << (ATA_SHIFT_PIO + 1) },
>
> Regards,
>
> Hans
>

Thank you.

--
BR,
Kate

2021-09-01 18:51:53

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

Hi,

On 9/1/21 10:55 AM, Hans de Goede wrote:
> Hi Tor,
>
> On 9/1/21 9:37 AM, [email protected] wrote:
>> (Sorry for not doing a proper reply)
>>
>> Hello,
>> Noob here.
>> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
>> it just works flawlessly on 5.13 and 5.14.
>> Are you sure that *every* 860/870 is concerned by this problem on
>> *every* AMD controller?
>
> I am pretty sure that every 860 / 870 EVO is affected,
> I am not sure if the PRO is also affected.

So while reading https://bugzilla.kernel.org/show_bug.cgi?id=201693
again to add a comment asking if anyone was seeing this on a
pro to I found existing comments of both queued-trims being
an issue on the 860 pro, as well as the 860 pro having issues
with some AMD sata controllers.

So it seems safe to say that the 860 pro has the same issues
as the 860 and 870 evo models. Chances are you don't have
discard support enabled causing you to not see the queued-trim
issues (which means you also won't see any difference from
disabling support for queued-trim commands).

So this just leaves your question of:

"concerned by this problem on *every* AMD controller?"

Where "this problem" is needing to completely disable NCQ
and I guess the answer is no, not every AMD controller
is affected. Still the plan is to err on the safe side for now,
allowing overriding this from the kernel cmdline with:

libata.force=ncqamd

I will add a comment to:

https://bugzilla.kernel.org/show_bug.cgi?id=201693

Asking for PCI-ids of the controllers where people are seeing
this and then maybe we can narrow down the "AMD" check in a
future follow up patch.

Regards,

Hans

2021-09-01 18:51:54

by Tor Vic

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.


> Hans de Goede <[email protected]> hat am 01.09.2021 11:38 geschrieben:
>
>
> Hi,
>
> On 9/1/21 10:55 AM, Hans de Goede wrote:
> > Hi Tor,
> >
> > On 9/1/21 9:37 AM, [email protected] wrote:
> >> (Sorry for not doing a proper reply)
> >>
> >> Hello,
> >> Noob here.
> >> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
> >> it just works flawlessly on 5.13 and 5.14.
> >> Are you sure that *every* 860/870 is concerned by this problem on
> >> *every* AMD controller?
> >
> > I am pretty sure that every 860 / 870 EVO is affected,
> > I am not sure if the PRO is also affected.
>
> So while reading https://bugzilla.kernel.org/show_bug.cgi?id=201693
> again to add a comment asking if anyone was seeing this on a
> pro to I found existing comments of both queued-trims being
> an issue on the 860 pro, as well as the 860 pro having issues
> with some AMD sata controllers.
>
> So it seems safe to say that the 860 pro has the same issues
> as the 860 and 870 evo models. Chances are you don't have
> discard support enabled causing you to not see the queued-trim
> issues (which means you also won't see any difference from
> disabling support for queued-trim commands).

Thanks for your answer, Hans.

If you mean the "discard" mount option, then yes, you're correct,
I don't use this because some (apparently) knowledgeable people
recommended against using it especially on LUKS partitions.
I don't know whether that's true though, it might be outdated.
I do however do manual TRIMs with "fstrim".

>
> So this just leaves your question of:
>
> "concerned by this problem on *every* AMD controller?"
>
> Where "this problem" is needing to completely disable NCQ
> and I guess the answer is no, not every AMD controller
> is affected. Still the plan is to err on the safe side for now,
> allowing overriding this from the kernel cmdline with:
>
> libata.force=ncqamd

I agree to do it in a safe way, that sounds like a good solution.

>
> I will add a comment to:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201693
>
> Asking for PCI-ids of the controllers where people are seeing
> this and then maybe we can narrow down the "AMD" check in a
> future follow up patch.

I can send you PCI and device IDs later if it helps.

>
> Regards,
>
> Hans

2021-09-01 20:21:12

by Tor Vic

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.



On 01.09.21 09:38, Hans de Goede wrote:
> Hi,
>
> On 9/1/21 10:55 AM, Hans de Goede wrote:
>> Hi Tor,
>>
>> On 9/1/21 9:37 AM, [email protected] wrote:
>>> (Sorry for not doing a proper reply)
>>>
>>> Hello,
>>> Noob here.
>>> I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
>>> it just works flawlessly on 5.13 and 5.14.
>>> Are you sure that *every* 860/870 is concerned by this problem on
>>> *every* AMD controller?
>>
>> I am pretty sure that every 860 / 870 EVO is affected,
>> I am not sure if the PRO is also affected.
>
> So while reading https://bugzilla.kernel.org/show_bug.cgi?id=201693
> again to add a comment asking if anyone was seeing this on a
> pro to I found existing comments of both queued-trims being
> an issue on the 860 pro, as well as the 860 pro having issues
> with some AMD sata controllers.
>
> So it seems safe to say that the 860 pro has the same issues
> as the 860 and 870 evo models. Chances are you don't have
> discard support enabled causing you to not see the queued-trim
> issues (which means you also won't see any difference from
> disabling support for queued-trim commands).
>
> So this just leaves your question of:
>
> "concerned by this problem on *every* AMD controller?"
>
> Where "this problem" is needing to completely disable NCQ
> and I guess the answer is no, not every AMD controller
> is affected. Still the plan is to err on the safe side for now,
> allowing overriding this from the kernel cmdline with:
>
> libata.force=ncqamd
>
> I will add a comment to:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201693
>
> Asking for PCI-ids of the controllers where people are seeing
> this and then maybe we can narrow down the "AMD" check in a
> future follow up patch.

$ lspci -nn | grep -i sata


06:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7901] (rev 51)
07:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7901] (rev 51)

Both Samsung 860 Pro and 860 Evo are connected to these.

>
> Regards,
>
> Hans
>

2021-09-01 20:22:08

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.


Tor,

> Samsung 860 Pro:
> $ cat /sys/block/sda/device/queue_depth
>
> 32
>
> Samsung 860 Evo:
> $ cat /sys/block/sdb/device/queue_depth
>
> 32

Please also provide the output of:

# grep . /sys/class/ata_device/dev*/trim

--
Martin K. Petersen Oracle Linux Engineering

2021-09-02 07:48:03

by Tor Vic

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.


> Martin K. Petersen <[email protected]> hat am 01.09.2021 20:58 geschrieben:
>
>
> Tor,
>
> > Samsung 860 Pro:
> > $ cat /sys/block/sda/device/queue_depth
> >
> > 32
> >
> > Samsung 860 Evo:
> > $ cat /sys/block/sdb/device/queue_depth
> >
> > 32
>
> Please also provide the output of:
>
> # grep . /sys/class/ata_device/dev*/trim

/sys/class/ata_device/dev1.0/trim:unsupported
/sys/class/ata_device/dev2.0/trim:unsupported
/sys/class/ata_device/dev3.0/trim:queued
/sys/class/ata_device/dev4.0/trim:queued
/sys/class/ata_device/dev5.0/trim:unsupported

According to the symlinks in /dev/disk/by-path,
3.0 corresponds to /dev/sda (860 Pro) and
4.0 corresponds to /dev/sdb (860 Evo).

>
> --
> Martin K. Petersen Oracle Linux Engineering

2021-09-02 23:34:53

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.


Tor,

>> # grep . /sys/class/ata_device/dev*/trim
>
> /sys/class/ata_device/dev1.0/trim:unsupported
> /sys/class/ata_device/dev2.0/trim:unsupported
> /sys/class/ata_device/dev3.0/trim:queued
> /sys/class/ata_device/dev4.0/trim:queued
> /sys/class/ata_device/dev5.0/trim:unsupported
>
> According to the symlinks in /dev/disk/by-path,
> 3.0 corresponds to /dev/sda (860 Pro) and
> 4.0 corresponds to /dev/sdb (860 Evo).

Good data! Thanks!

--
Martin K. Petersen Oracle Linux Engineering