2021-08-27 05:36:18

by Kate Hsuan

[permalink] [raw]
Subject: [PATCH v2 0/1] libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

Entire disabling NCQ for Samsugn 860/870 SSD will cause I/O performance
drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is
introduced to used to performe additional check for these SSDs. If it
finds it's parent ATA controller is AsMedia/AMD/Marvell, the NCQ will be
disabled. Otherwise, the NCQ is kept to enable.

Kate Hsuan (1):
libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870
SSDs

drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
include/linux/libata.h | 3 +++
2 files changed, 35 insertions(+), 5 deletions(-)

--
2.31.1


2021-08-27 05:36:21

by Kate Hsuan

[permalink] [raw]
Subject: [PATCH v2 1/1] libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ
on AMD/MAREL/ASMEDIA chipsets.

Samsung 860/870 SSD are disabled to use NCQ trim functions but it will
lead to performace drop. From the bugzilla, we could realize the issues
only appears on those chipset mentioned above. So this flag could be
used to only disable NCQ on specific chipsets.

Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Kate Hsuan <[email protected]>
---
drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
include/linux/libata.h | 3 +++
2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index cc459ce90018..50f635669dd4 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev)
static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
{
struct ata_port *ap = dev->link->ap;
+ struct device *parent = NULL;
+ struct pci_dev *pcidev = NULL;
unsigned int err_mask;

if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) {
@@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE);

if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
- ata_dev_dbg(dev, "disabling queued TRIM support\n");
- cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
- ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
+ if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
+ {
+ // get parent device for the controller vendor ID
+ for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
+ {
+ if(dev_is_pci(parent))
+ {
+ pcidev = to_pci_dev(parent);
+ if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
+ pcidev->vendor == PCI_VENDOR_ID_AMD ||
+ pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )
+ {
+ ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n",
+ pcidev->vendor, pcidev->device);
+ cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
+ ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
+ }
+ break;
+ }
+ }
+ }else
+ {
+ ata_dev_dbg(dev, "disabling queued TRIM support\n");
+ cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
+ ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
+ }
}
}
}
@@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
{ "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM, },
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
{ "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM, },
+ ATA_HORKAGE_ZERO_AFTER_TRIM |
+ ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
{ "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },

diff --git a/include/linux/libata.h b/include/linux/libata.h
index 3fcd24236793..ec17f1f3fbf6 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -422,6 +422,9 @@ enum {
ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
+ ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on
+ ASMeida, AMD, and Marvell
+ Chipset*/

/* DMA mask for user DMA control: User visible values; DO NOT
renumber */
--
2.31.1

2021-08-27 06:06:23

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

On 2021/08/27 14:34, Kate Hsuan wrote:
> A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ
> on AMD/MAREL/ASMEDIA chipsets.
>
> Samsung 860/870 SSD are disabled to use NCQ trim functions but it will
> lead to performace drop. From the bugzilla, we could realize the issues
> only appears on those chipset mentioned above. So this flag could be
> used to only disable NCQ on specific chipsets.
>
> Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
> Signed-off-by: Hans de Goede <[email protected]>
> Reviewed-by: Martin K. Petersen <[email protected]>

Since this is a v2, you should not keep the review tag here.

> Signed-off-by: Kate Hsuan <[email protected]>
> ---

This is a v2 patch, so please add the changelog from v1 here.

But I think that v1 introduced ATA_HORKAGE_NO_NCQ_TRIM. Yet, here you introduce
a completely different flag on top of the patch that introduced
ATA_HORKAGE_NO_NCQ_TRIM. So this patch is not version 2 of the previous one. It
is a completely different patch. Unless I missed v1 on the list...


> drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
> include/linux/libata.h | 3 +++
> 2 files changed, 35 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index cc459ce90018..50f635669dd4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev)
> static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
> {
> struct ata_port *ap = dev->link->ap;
> + struct device *parent = NULL;
> + struct pci_dev *pcidev = NULL;
> unsigned int err_mask;
>
> if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) {
> @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
> memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE);
>
> if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
> - ata_dev_dbg(dev, "disabling queued TRIM support\n");
> - cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> - ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
> + {

Please follow the kernel coding style: do not break line before "{". This
comment applies to all your modifications below too.

> + // get parent device for the controller vendor ID
> + for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
> + {
> + if(dev_is_pci(parent))
> + {
> + pcidev = to_pci_dev(parent);
> + if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
> + pcidev->vendor == PCI_VENDOR_ID_AMD ||
> + pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )

Please align the conditions.

> + {
> + ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n",
> + pcidev->vendor, pcidev->device);

Weird alignment here too. Move the arguments aligned with "dev" at the beginning
of the line.

> + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + }
> + break;
> + }
> + }
> + }else
> + {
> + ata_dev_dbg(dev, "disabling queued TRIM support\n");
> + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + }
> }
> }
> }
> @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
> { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
> { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },

You named your flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL but you are
applying it to a Samsung device. This is rather confusing. I do not think you
need to have the vendor in the flag name, e.g. ATA_HORKAGE_NO_NCQ. Whatever
device in ata_device_blacklist has the flag will have NCQ disabled.

> { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },

If you disable NCQ entirely for this device, then what is the point of keeping
ATA_HORKAGE_NO_NCQ_TRIM ? TRIM commands will not be NCQ anymore, similarly to
all other commands.

> { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
>
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 3fcd24236793..ec17f1f3fbf6 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -422,6 +422,9 @@ enum {
> ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
> ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
> ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on
> + ASMeida, AMD, and Marvell
> + Chipset*/

See above. You do not need to have the vendor name in the flag name.

>
> /* DMA mask for user DMA control: User visible values; DO NOT
> renumber */
>


--
Damien Le Moal
Western Digital Research

2021-08-27 06:57:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

On Fri, Aug 27, 2021 at 01:33:44AM -0400, Kate Hsuan wrote:
> A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ
> on AMD/MAREL/ASMEDIA chipsets.
>
> Samsung 860/870 SSD are disabled to use NCQ trim functions but it will
> lead to performace drop. From the bugzilla, we could realize the issues
> only appears on those chipset mentioned above. So this flag could be
> used to only disable NCQ on specific chipsets.
>
> Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
> Signed-off-by: Hans de Goede <[email protected]>
> Reviewed-by: Martin K. Petersen <[email protected]>
> Signed-off-by: Kate Hsuan <[email protected]>
> ---
> drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
> include/linux/libata.h | 3 +++
> 2 files changed, 35 insertions(+), 5 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

2021-08-27 10:39:31

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

Hi Kate,

Some background info for other people following this thread, as discussed here:
https://bugzilla.kernel.org/show_bug.cgi?id=201693
https://bugzilla.kernel.org/show_bug.cgi?id=203475

There are a lot of users who are reporting disk issues (including data
corruption with Samsung 860 and 870 SSDs. Coming up with fixes for this
has taken longer then it should because I failed to realize for a long
time that there really are 2 separate issues here:

https://bugzilla.kernel.org/show_bug.cgi?id=203475#c34

"""
So after completely re-reading / analyzing both this bug as well as bug 201693 with a fresh pair of eyes (since the last time I did this was a long time ago) I agree. After careful reading / analysis it seems that there really are 2 different bugs here impacting both the 860 EVO and the 870 EVO:

1. Queued Trim commands are causing issues on Intel + ASmedia + Marvell controllers

2. Things are seriously broken on AMD controllers and only completely disabling NCQ altogether helps there.


I will submit a kernel patch (with a Fixes tag so that it gets backported to stable series) for 1. right away; and I've asked a colleague to start working on a new ATA horkage flag which disables NCQ on AMD SATA controllers only, so that we can add that flag (together with the ATA_HORKAGE_NO_NCQ_TRIM flag which my patch adds) to the 860 EVO and the 870 EVO to also resolve 2.
"""

I asked Kate to write this patch to address 2., note this patch is to be applied
on top of my " libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs"
patch.

Kate, thank you for your patch. There are several issues which need
to be addressed before this patch can be accepted, starting with the
commit message.

It seems that you used the commit message as my patch as a template, but
you forgot to change the Subject (the first line) for the next version please
change the subject to something correctly describing this patch.

I also see that you gave this patch a version of 2, but since this patch
does not replace my patch, in other words it is a different patch you
should have just made it v1. Anyways lets just make the next version v3
to avoid confusion.

The rest of the commit message should have 1 paragraph describing the reason
why the patch is necessary + a second paragraph describing what the patch
is doing to address this. Your cover-letter would be a good candidate for
the second paragraph, resulting in for example something like this as
body of the commit message:

"""
Many users are reporting that the Samsung 860 and 870 SSDs are having
various issues when combined with AMD SATA controllers and only completely
disabling NCQ helps to avoid these issues.

Entirely disabling NCQ for Samsung 860/870 SSD will cause I/O performance
drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_AMD is introduced to used
to perform additional check for these SSDs. If it finds it's parent ATA
controller is AMD then NCQ will be disabled. Otherwise the NCQ is kept to
enable.
"""

On 8/27/21 7:33 AM, Kate Hsuan wrote:
> A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ
> on AMD/MAREL/ASMEDIA chipsets.
>
> Samsung 860/870 SSD are disabled to use NCQ trim functions but it will
> lead to performace drop. From the bugzilla, we could realize the issues
> only appears on those chipset mentioned above. So this flag could be
> used to only disable NCQ on specific chipsets.
>
> Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
> Signed-off-by: Hans de Goede <[email protected]>
> Reviewed-by: Martin K. Petersen <[email protected]>
> Signed-off-by: Kate Hsuan <[email protected]>
> ---
> drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++-----
> include/linux/libata.h | 3 +++
> 2 files changed, 35 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index cc459ce90018..50f635669dd4 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev)
> static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
> {
> struct ata_port *ap = dev->link->ap;
> + struct device *parent = NULL;
> + struct pci_dev *pcidev = NULL;
> unsigned int err_mask;
>
> if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) {
> @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev)
> memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE);
>
> if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
> - ata_dev_dbg(dev, "disabling queued TRIM support\n");
> - cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> - ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
> + {
> + // get parent device for the controller vendor ID
> + for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
> + {
> + if(dev_is_pci(parent))
> + {
> + pcidev = to_pci_dev(parent);
> + if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
> + pcidev->vendor == PCI_VENDOR_ID_AMD ||
> + pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )
> + {
> + ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n",
> + pcidev->vendor, pcidev->device);
> + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + }
> + break;
> + }
> + }
> + }else
> + {
> + ata_dev_dbg(dev, "disabling queued TRIM support\n");
> + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
> + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
> + }

Please don't nest the handling of the new ATA_HORKAGE_NONCQ_ON_AMD flag with the handling of other flags.

Also you are just disabling queued-trims now, which my patch already does, instead the new check should
completely disable NCQ, this means moving the check to ata_dev_config_ncq() adding the new check
after this check:

if (dev->horkage & ATA_HORKAGE_NONCQ) {
snprintf(desc, desc_sz, "NCQ (not used)");
return 0;
}

And then do the same, but only if pcidev->vendor == PCI_VENDOR_ID_AMD.



> }
> }
> }
> @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
> { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
> { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
> { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> - ATA_HORKAGE_ZERO_AFTER_TRIM, },
> + ATA_HORKAGE_ZERO_AFTER_TRIM |
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
> { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
> ATA_HORKAGE_ZERO_AFTER_TRIM, },
>
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 3fcd24236793..ec17f1f3fbf6 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -422,6 +422,9 @@ enum {
> ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */
> ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */
> ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
> + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on
> + ASMeida, AMD, and Marvell
> + Chipset*/

When we initially discussed this I know I said that we would need to disable
NCQ on AMD + ASMEDIA + Marvell hosts, but after carefully reading both bugs again
I've come to the conclusion that for Asmedia and Marvell SATA hosts just disabling
trimmed queue as my patch does is enough. So please rename this to:

ATA_HORKAGE_NONCQ_ON_AMD

(and only check for an AMD vendor-id when doing the check).

>
> /* DMA mask for user DMA control: User visible values; DO NOT
> renumber */
>

Regards,

Hans