2023-01-04 09:47:29

by Hector Martin

[permalink] [raw]
Subject: [PATCH] nvme-apple: Add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression

From the get-go, this driver and the ANS syslog have been complaining
about namespace identification. In 6.2-rc1, commit 811f4de0344d ("nvme:
avoid fallback to sequential scan due to transient issues") regressed
the driver by no longer allowing fallback to sequential namespace scans,
leaving us with no namespaces.

It turns out that the real problem is that this controller claiming
NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe
1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for
the controller to fix all this nonsense (including other errors
triggered by other CNS commands).

Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues")
Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Signed-off-by: Hector Martin <[email protected]>
---
drivers/nvme/host/apple.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index e17d3a8a0107..e13a992b6096 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1553,7 +1553,7 @@ static int apple_nvme_probe(struct platform_device *pdev)
}

ret = nvme_init_ctrl(&anv->ctrl, anv->dev, &nvme_ctrl_ops,
- NVME_QUIRK_SKIP_CID_GEN);
+ NVME_QUIRK_SKIP_CID_GEN | NVME_QUIRK_IDENTIFY_CNS);
if (ret) {
dev_err_probe(dev, ret, "Failed to initialize nvme_ctrl");
goto put_dev;
--
2.35.1


2023-01-04 10:04:14

by Sven Peter

[permalink] [raw]
Subject: Re: [PATCH] nvme-apple: Add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression

Hi,


On Wed, Jan 4, 2023, at 10:21, Hector Martin wrote:
> From the get-go, this driver and the ANS syslog have been complaining
> about namespace identification. In 6.2-rc1, commit 811f4de0344d ("nvme:
> avoid fallback to sequential scan due to transient issues") regressed
> the driver by no longer allowing fallback to sequential namespace scans,
> leaving us with no namespaces.
>
> It turns out that the real problem is that this controller claiming
> NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe
> 1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for
> the controller to fix all this nonsense (including other errors
> triggered by other CNS commands).
>
> Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to
> transient issues")
> Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
> Signed-off-by: Hector Martin <[email protected]>

Nice, I've been meaning to look into those weird namespace scanning errors
for a while now but never got around to it because they didn't break anything.

There's a chance this is also required for the later T2/x86 Macs in pci.c
(PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2005)) since they share a similar firmware but
I don't have access to those to test if this is actually required.

Reviewed-by: Sven Peter <[email protected]>



Best,

Sven

2023-01-04 10:34:20

by Hector Martin

[permalink] [raw]
Subject: [PATCH] nvme-pci: Add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers

This mirrors the quirk added to Apple Silicon controllers in apple.c.
These controllers do not support the Active NS ID List command and
behave identically to the SoC version judging by existing user
reports/syslogs, so will need the same fix. This quirk reverts
back to NVMe 1.0 behavior and disables the broken commands.

Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues")
Signed-off-by: Hector Martin <[email protected]>
---
drivers/nvme/host/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Note: this is untested (since probably nobody with a T2 is running
6.2 RCs yet), but given that these controllers share the same firmware
codebase and some other quirks, it is almost certain they regressed
just like the M1/M2 controllers did. Existing syslogs from T2 machines
show the same errors we were getting on M1/M2 prior to the regression,
which points to them having the same issue of treating CNS as a binary
flag as in NVMe 1.0 all along.

Sven has asked some of the T2 folks if they can test a 6.2 RC to verify
the same regression happened, so hopefully we can get confirmation that
this needs fixing (and the fix works), but if we end up getting no
feedback I'd lean towards just getting this applied as a fix since it's
unlikely to break anything and highly likely to fix a regression.

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b13baccedb4a..91f8adcf6056 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3495,7 +3495,8 @@ static const struct pci_device_id nvme_id_table[] = {
.driver_data = NVME_QUIRK_SINGLE_VECTOR |
NVME_QUIRK_128_BYTES_SQES |
NVME_QUIRK_SHARED_TAGS |
- NVME_QUIRK_SKIP_CID_GEN },
+ NVME_QUIRK_SKIP_CID_GEN |
+ NVME_QUIRK_IDENTIFY_CNS },
{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
{ 0, }
};
--
2.35.1

2023-01-04 12:25:13

by Orlando Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] nvme-pci: Add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers

> This mirrors the quirk added to Apple Silicon controllers in apple.c.
> These controllers do not support the Active NS ID List command and
> behave identically to the SoC version judging by existing user
> reports/syslogs, so will need the same fix. This quirk reverts
> back to NVMe 1.0 behavior and disables the broken commands.
>
> Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues")
> Signed-off-by: Hector Martin <[email protected]>

On T2 macbookpro16,1 with 6.2.0-rc3-00010-g69b41ac87e4a I had this in
dmesg:

nvme nvme0: 1/0/0 default/read/poll queues
nvme nvme0: Identify NS List failed (status=0xb)

And in /dev only nvme0 existed (no nvme0n1*).

This patch fixed that and /dev/nvme0n1p* existed.

Tested-by: Orlando Chamberlain <[email protected]>

2023-01-08 18:59:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-apple: Add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression

Thanks,

I've applied both patches to nvme-6.2.