2021-03-10 13:23:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

Can you try this patch instead?

http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html

On Wed, Mar 10, 2021 at 02:51:16PM +0300, Dmitry Monakhov wrote:
> This adds a quirk for Samsung PM1725a drive which fixes timeouts and
> I/O errors due to the fact that the controller does not properly
> handle the Write Zeroes command, dmesg log:
>
> nvme nvme0: I/O 528 QID 10 timeout, aborting
> nvme nvme0: I/O 529 QID 10 timeout, aborting
> nvme nvme0: I/O 530 QID 10 timeout, aborting
> nvme nvme0: I/O 531 QID 10 timeout, aborting
> nvme nvme0: I/O 532 QID 10 timeout, aborting
> nvme nvme0: I/O 533 QID 10 timeout, aborting
> nvme nvme0: I/O 534 QID 10 timeout, aborting
> nvme nvme0: I/O 535 QID 10 timeout, aborting
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: I/O 528 QID 10 timeout, reset controller
> nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
> nvme nvme0: Device not ready; aborting reset, CSTS=0x3
> nvme nvme0: Device not ready; aborting reset, CSTS=0x3
> nvme nvme0: Removing after probe failure status: -19
> nvme0n1: detected capacity change from 6251233968 to 0
> blk_update_request: I/O error, dev nvme0n1, sector 32776 op 0x1:(WRITE) flags 0x3000 phys_seg 6 prio class 0
> blk_update_request: I/O error, dev nvme0n1, sector 113319936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 1, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113319680 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 2, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113319424 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 3, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113319168 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 4, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113318912 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 5, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113318656 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> Buffer I/O error on dev nvme0n1p2, logical block 6, lost async page write
> blk_update_request: I/O error, dev nvme0n1, sector 113318400 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> blk_update_request: I/O error, dev nvme0n1, sector 113318144 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> blk_update_request: I/O error, dev nvme0n1, sector 113317888 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>
> Signed-off-by: Dmitry Monakhov <[email protected]>
> ---
> drivers/nvme/host/pci.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 17ab332..7249ae7 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3246,6 +3246,7 @@ static const struct pci_device_id nvme_id_table[] = {
> .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
> { PCI_DEVICE(0x144d, 0xa822), /* Samsung PM1725a */
> .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY |
> + NVME_QUIRK_DISABLE_WRITE_ZEROES|
> NVME_QUIRK_IGNORE_DEV_SUBNQN, },
> { PCI_DEVICE(0x1987, 0x5016), /* Phison E16 */
> .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, },
> --
> 2.7.4
---end quoted text---


2021-03-10 13:42:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On Wed, Mar 10, 2021 at 02:21:56PM +0100, Christoph Hellwig wrote:
> Can you try this patch instead?
>
> http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html

Actually, please try the patch below instead, it looks like our existing
logic messes up the units:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e68a8c4ac5a6ea..1867fdf2205bd7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1963,30 +1963,18 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
}

-static void nvme_config_write_zeroes(struct gendisk *disk, struct nvme_ns *ns)
+/*
+ * Even though NVMe spec explicitly states that MDTS is not applicable to the
+ * write-zeroes, we are cautious and limit the size to the controllers
+ * max_hw_sectors value, which is based on the MDTS field and possibly other
+ * limiting factors.
+ */
+static void nvme_config_write_zeroes(struct request_queue *q,
+ struct nvme_ctrl *ctrl)
{
- u64 max_blocks;
-
- if (!(ns->ctrl->oncs & NVME_CTRL_ONCS_WRITE_ZEROES) ||
- (ns->ctrl->quirks & NVME_QUIRK_DISABLE_WRITE_ZEROES))
- return;
- /*
- * Even though NVMe spec explicitly states that MDTS is not
- * applicable to the write-zeroes:- "The restriction does not apply to
- * commands that do not transfer data between the host and the
- * controller (e.g., Write Uncorrectable ro Write Zeroes command).".
- * In order to be more cautious use controller's max_hw_sectors value
- * to configure the maximum sectors for the write-zeroes which is
- * configured based on the controller's MDTS field in the
- * nvme_init_identify() if available.
- */
- if (ns->ctrl->max_hw_sectors == UINT_MAX)
- max_blocks = (u64)USHRT_MAX + 1;
- else
- max_blocks = ns->ctrl->max_hw_sectors + 1;
-
- blk_queue_max_write_zeroes_sectors(disk->queue,
- nvme_lba_to_sect(ns, max_blocks));
+ if ((ctrl->oncs & NVME_CTRL_ONCS_WRITE_ZEROES) &&
+ !(ctrl->quirks & NVME_QUIRK_DISABLE_WRITE_ZEROES))
+ blk_queue_max_write_zeroes_sectors(q, ctrl->max_hw_sectors);
}

static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
@@ -2158,7 +2146,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
set_capacity_and_notify(disk, capacity);

nvme_config_discard(disk, ns);
- nvme_config_write_zeroes(disk, ns);
+ nvme_config_write_zeroes(disk->queue, ns->ctrl);

set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
test_bit(NVME_NS_FORCE_RO, &ns->flags));

2021-03-10 20:02:09

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On Wed, Mar 10, 2021 at 02:41:10PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 10, 2021 at 02:21:56PM +0100, Christoph Hellwig wrote:
> > Can you try this patch instead?
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html
>
> Actually, please try the patch below instead, it looks like our existing
> logic messes up the units:

This seems like a good opportunity to incorporate TP4040 for non-MDTS
command limits. I sent a proposal here

http://lists.infradead.org/pipermail/linux-nvme/2021-March/023522.html

And it defaults to your suggestion if the controller doesn't implement
the capability.

2021-03-11 10:32:12

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a



10.03.2021, 16:41, "Christoph Hellwig" <[email protected]>:
> On Wed, Mar 10, 2021 at 02:21:56PM +0100, Christoph Hellwig wrote:
>>  Can you try this patch instead?
>>
>>  http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html
>
> Actually, please try the patch below instead, it looks like our existing
> logic messes up the units:
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e68a8c4ac5a6ea..1867fdf2205bd7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1963,30 +1963,18 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns)
>                  blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
>  }
>
> -static void nvme_config_write_zeroes(struct gendisk *disk, struct nvme_ns *ns)
> +/*
> + * Even though NVMe spec explicitly states that MDTS is not applicable to the
> + * write-zeroes, we are cautious and limit the size to the controllers
> + * max_hw_sectors value, which is based on the MDTS field and possibly other
> + * limiting factors.
> + */
> +static void nvme_config_write_zeroes(struct request_queue *q,
> + struct nvme_ctrl *ctrl)
>  {
> - u64 max_blocks;
> -
> - if (!(ns->ctrl->oncs & NVME_CTRL_ONCS_WRITE_ZEROES) ||
> - (ns->ctrl->quirks & NVME_QUIRK_DISABLE_WRITE_ZEROES))
> - return;
> - /*
> - * Even though NVMe spec explicitly states that MDTS is not
> - * applicable to the write-zeroes:- "The restriction does not apply to
> - * commands that do not transfer data between the host and the
> - * controller (e.g., Write Uncorrectable ro Write Zeroes command).".
> - * In order to be more cautious use controller's max_hw_sectors value
> - * to configure the maximum sectors for the write-zeroes which is
> - * configured based on the controller's MDTS field in the
> - * nvme_init_identify() if available.
> - */
> - if (ns->ctrl->max_hw_sectors == UINT_MAX)
> - max_blocks = (u64)USHRT_MAX + 1;
> - else
> - max_blocks = ns->ctrl->max_hw_sectors + 1;
> -
> - blk_queue_max_write_zeroes_sectors(disk->queue,
> - nvme_lba_to_sect(ns, max_blocks));
> + if ((ctrl->oncs & NVME_CTRL_ONCS_WRITE_ZEROES) &&
> + !(ctrl->quirks & NVME_QUIRK_DISABLE_WRITE_ZEROES))
> + blk_queue_max_write_zeroes_sectors(q, ctrl->max_hw_sectors);
>  }
>
>  static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
> @@ -2158,7 +2146,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
>          set_capacity_and_notify(disk, capacity);
>
>          nvme_config_discard(disk, ns);
> - nvme_config_write_zeroes(disk, ns);
> + nvme_config_write_zeroes(disk->queue, ns->ctrl);
>
>          set_disk_ro(disk, (id->nsattr & NVME_NS_ATTR_RO) ||
>                  test_bit(NVME_NS_FORCE_RO, &ns->flags));
In order to exclude possible issue with incorrect request sized I've run test which does write_zeroes,
via fio-fallocate randrtim, which actually does fallocate punch_hole+keep_size which converts to blkdev_issue_zeroout()
note: fio should be patched, see: https://github.com/axboe/fio/pull/1203

fio --name t --ioengine=falloc --rw=randtrim --bs=512 --size=100M --filename=/dev/nvme0n1 --numjobs=16
After a couple of minutes it stuck, and then timeout occour.
cat /sys/kernel/debug/block/nvme0n1/hctx*/busy
00000000cd27b755 {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=205, .internal_tag=-1}
000000009d3f2b8f {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=244, .internal_tag=-1}
00000000eb4166fe {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=709, .internal_tag=-1}
0000000049b49c60 {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=433, .internal_tag=-1}
0000000018b93c40 {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=5, .internal_tag=-1}
00000000ac15ef73 {.op=WRITE_ZEROES, .cmd_flags=SYNC, .rq_flags=DONTPREP|IO_STAT|STATS, .state=in_flight, .tag=268, .internal_tag=-1}

So, this is definitely hardware issue, and write_zeroes should be disabled for this particular model.

2021-03-11 10:48:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On Wed, Mar 10, 2021 at 12:00:30PM -0800, Keith Busch wrote:
> On Wed, Mar 10, 2021 at 02:41:10PM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 10, 2021 at 02:21:56PM +0100, Christoph Hellwig wrote:
> > > Can you try this patch instead?
> > >
> > > http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html
> >
> > Actually, please try the patch below instead, it looks like our existing
> > logic messes up the units:
>
> This seems like a good opportunity to incorporate TP4040 for non-MDTS
> command limits. I sent a proposal here
>
> http://lists.infradead.org/pipermail/linux-nvme/2021-March/023522.html
>
> And it defaults to your suggestion if the controller doesn't implement
> the capability.

I think TP4040 is good and useful, but I defintively want the pure
fix get in first.

2021-03-23 08:39:45

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On 11.03.2021 11:47, Christoph Hellwig wrote:
>On Wed, Mar 10, 2021 at 12:00:30PM -0800, Keith Busch wrote:
>> On Wed, Mar 10, 2021 at 02:41:10PM +0100, Christoph Hellwig wrote:
>> > On Wed, Mar 10, 2021 at 02:21:56PM +0100, Christoph Hellwig wrote:
>> > > Can you try this patch instead?
>> > >
>> > > http://lists.infradead.org/pipermail/linux-nvme/2021-February/023183.html
>> >
>> > Actually, please try the patch below instead, it looks like our existing
>> > logic messes up the units:
>>
>> This seems like a good opportunity to incorporate TP4040 for non-MDTS
>> command limits. I sent a proposal here
>>
>> http://lists.infradead.org/pipermail/linux-nvme/2021-March/023522.html
>>
>> And it defaults to your suggestion if the controller doesn't implement
>> the capability.
>
>I think TP4040 is good and useful, but I defintively want the pure
>fix get in first.

Quick question. It seems like the current quirk simply disables
write-zeroes. Would you be open for a quirk that aligns with MDTS for
models that implemented it this way before TP4040?

2021-03-23 12:32:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On Tue, Mar 23, 2021 at 09:37:49AM +0100, Javier Gonz?lez wrote:
> Quick question. It seems like the current quirk simply disables
> write-zeroes. Would you be open for a quirk that aligns with MDTS for
> models that implemented it this way before TP4040?

Aligning to MDTS is our current behavior, although all kernels up to
5.11 had a bug in the calculation.

2021-03-23 12:48:34

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On 23.03.2021 13:31, Christoph Hellwig wrote:
>On Tue, Mar 23, 2021 at 09:37:49AM +0100, Javier González wrote:
>> Quick question. It seems like the current quirk simply disables
>> write-zeroes. Would you be open for a quirk that aligns with MDTS for
>> models that implemented it this way before TP4040?
>
>Aligning to MDTS is our current behavior, although all kernels up to
>5.11 had a bug in the calculation.

I see. Let me check internally and see what's going on with
write-zeroes on this model.

2021-04-08 10:31:44

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On 23.03.2021 13:43, Javier González wrote:
>On 23.03.2021 13:31, Christoph Hellwig wrote:
>>On Tue, Mar 23, 2021 at 09:37:49AM +0100, Javier González wrote:
>>>Quick question. It seems like the current quirk simply disables
>>>write-zeroes. Would you be open for a quirk that aligns with MDTS for
>>>models that implemented it this way before TP4040?
>>
>>Aligning to MDTS is our current behavior, although all kernels up to
>>5.11 had a bug in the calculation.
>
>I see. Let me check internally and see what's going on with
>write-zeroes on this model.

We still need to confirm, but it seems like MDTS for write-zeroes is
reported wrong in the FW that Dmitry is using. We can at least reproduce
it.

Would it be a possibility to add quirk infrastructure to hardcode MDTS
for FW versions prior TP4040?

Another possibility is to add quirks to the TP4040 support patches to
enable this - it might also help reduce the list of models currently
blacklisted for write-zeroes.

2021-04-08 12:16:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On Thu, Apr 08, 2021 at 12:30:16PM +0200, Javier Gonz?lez wrote:
>>> Aligning to MDTS is our current behavior, although all kernels up to
>>> 5.11 had a bug in the calculation.
>>
>> I see. Let me check internally and see what's going on with
>> write-zeroes on this model.
>
> We still need to confirm, but it seems like MDTS for write-zeroes is
> reported wrong in the FW that Dmitry is using. We can at least reproduce
> it.
>
> Would it be a possibility to add quirk infrastructure to hardcode MDTS
> for FW versions prior TP4040?
>
> Another possibility is to add quirks to the TP4040 support patches to
> enable this - it might also help reduce the list of models currently
> blacklisted for write-zeroes.

I'm not sure I understand you. Before TP4040 there is only the MDTS,
which only applies to data transfer commands, although we also
"volunarily" apply it to Write Zeroes. If MDTS is wrong this would
also affect normal I/O, so we really either need a firmware update
or a quirk. Or is the Write Zeroes limit even smaller than MTDS?
I'd rather not add another quirk with a specific limit in that case,
as well grow way too many of those. TP4040 is the way to go for that
case.

2021-04-08 18:19:41

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/1] nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a

On 08.04.2021 14:15, Christoph Hellwig wrote:
>On Thu, Apr 08, 2021 at 12:30:16PM +0200, Javier González wrote:
>>>> Aligning to MDTS is our current behavior, although all kernels up to
>>>> 5.11 had a bug in the calculation.
>>>
>>> I see. Let me check internally and see what's going on with
>>> write-zeroes on this model.
>>
>> We still need to confirm, but it seems like MDTS for write-zeroes is
>> reported wrong in the FW that Dmitry is using. We can at least reproduce
>> it.
>>
>> Would it be a possibility to add quirk infrastructure to hardcode MDTS
>> for FW versions prior TP4040?
>>
>> Another possibility is to add quirks to the TP4040 support patches to
>> enable this - it might also help reduce the list of models currently
>> blacklisted for write-zeroes.
>
>I'm not sure I understand you. Before TP4040 there is only the MDTS,
>which only applies to data transfer commands, although we also
>"volunarily" apply it to Write Zeroes. If MDTS is wrong this would
>also affect normal I/O, so we really either need a firmware update
>or a quirk. Or is the Write Zeroes limit even smaller than MTDS?

The latter. The Write Zeroes limit is smaller than MDTS.

>I'd rather not add another quirk with a specific limit in that case,
>as well grow way too many of those.

This is what I had in mind - a structure with the quirks that would set
the write zeroes limit for the cases prior to TP4040 and where this is
lower than MDTS. But fair enough; I can see how painful it can be to
maintain this.

>TP4040 is the way to go for that case.

I agree TP4040 is the way to move forward.

Here I have one question: How do you envision adding support for FW
updates that add TP4040 support (or fix MDTS) with regards with existing
quirks.