2021-02-01 18:34:44

by Jianxiong Gao

[permalink] [raw]
Subject: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao <[email protected]>
---
drivers/nvme/host/pci.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..30e45f7e0f75 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+ if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+ if (dma_set_min_align_mask(dev->dev, 0))
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
}

static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
- int nr_mapped;
+ int nr_mapped, offset_ret;

if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
if (!iod->nents)
goto out_free_sg;

+ offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
+ goto out_free_sg;
+ }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
+ goto out_free_sg;
+ }
if (!nr_mapped)
goto out_free_sg;

--
2.27.0


2021-02-01 18:58:47

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> NVMe driver relies on the address offset to function properly.
> This patch adds the offset preserve mask to NVMe driver when mapping
> via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> depends on the page size defined by CC.MPS register of NVMe
> controller.

...

> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> else
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> + goto out_free_sg;
> + }

Seems like rebasing effect which makes empty line goes in the middle of some
other group of code lines.

> if (!nr_mapped)
> goto out_free_sg;

Perhaps it should be here?

--
With Best Regards,
Andy Shevchenko


2021-02-01 19:38:22

by Jianxiong Gao

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On Mon, Feb 1, 2021 at 10:56 AM Andy Shevchenko
<[email protected]> wrote:
>
> On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> > NVMe driver relies on the address offset to function properly.
> > This patch adds the offset preserve mask to NVMe driver when mapping
> > via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> > depends on the page size defined by CC.MPS register of NVMe
> > controller.
>
> ...
>
> > if (is_pci_p2pdma_page(sg_page(iod->sg)))
> > nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> > iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> > else
> > nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> > rq_dma_dir(req), DMA_ATTR_NO_WARN);
> > +
> > + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> > + goto out_free_sg;
> > + }
>
> Seems like rebasing effect which makes empty line goes in the middle of some
> other group of code lines.
>
> > if (!nr_mapped)
> > goto out_free_sg;
>
> Perhaps it should be here?
Yes you are correct, it should be

else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
if (!nr_mapped)
goto out_free_sg;
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to
reset offset\n");
+ goto out_free_sg;
+ }

Thanks for pointing it out.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


--
Jianxiong Gao

2021-02-01 21:02:01

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> if (!iod->nents)
> goto out_free_sg;
>
> + offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
> + goto out_free_sg;
> + }
> +
> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> else
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> + goto out_free_sg;
> + }
> if (!nr_mapped)
> goto out_free_sg;

Why is this setting being done and undone on each IO? Wouldn't it be
more efficient to set it once during device initialization?

And more importantly, this isn't thread safe: one CPU may be setting the
device's dma alignment mask to 0 while another CPU is expecting it to be
NVME_CTRL_PAGE_SIZE - 1.

2021-02-01 21:25:52

by Jianxiong Gao

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?
>
> And more importantly, this isn't thread safe: one CPU may be setting the
> device's dma alignment mask to 0 while another CPU is expecting it to be
> NVME_CTRL_PAGE_SIZE - 1.

I was having trouble getting the OS booted when setting it once during
initialization. However when I rebased to the latest rc6 this morning it
seems to be working with setting the mask on probe. I am still testing out
and will appreciate any idea what may cause the nvme driver to fail
with a single mask.
--
Jianxiong Gao

2021-02-02 00:03:15

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On 2/1/21 13:27, Jianxiong Gao wrote:
>> Why is this setting being done and undone on each IO? Wouldn't it be
>> more efficient to set it once during device initialization?
>>
>> And more importantly, this isn't thread safe: one CPU may be setting the
>> device's dma alignment mask to 0 while another CPU is expecting it to be
>> NVME_CTRL_PAGE_SIZE - 1.
> I was having trouble getting the OS booted when setting it once during
> initialization. However when I rebased to the latest rc6 this morning it
> seems to be working with setting the mask on probe. I am still testing out
> and will appreciate any idea what may cause the nvme driver to fail
> with a single mask.
Based on the Keith's comment it needs to be completely avoided in hot path.

Did you get a chance to bisect the problem in the rc6 ? We need to know the
root cause otherwise it might happen again after we add this patch.

2021-02-02 00:30:40

by Jianxiong Gao

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?

I agree that setting it once is the right way of doing it.

So I have changed the patch to enable the mask once in nvme_probe.

drivers/nvme/host/pci.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..4ce78373f98d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2630,6 +2630,9 @@ static void nvme_reset_work(struct work_struct *work)
*/
dma_set_max_seg_size(dev->dev, 0xffffffff);

+ if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to
set offset\n");
+
mutex_unlock(&dev->shutdown_lock);

/*

However on boot of the system, the following error happens occasionally.
The error seems related to Journal service. Whenever "Stopping Journal
Service..."
appears, the boot succeeds. Otherwise boot fails with the following message.

----------------------------log start here--------------------------
[ OK ] Started Journal Service.
[ 10.774545] xfs filesystem being remounted at / supports timestamps
until 2038 (0x7fffffff)
[ OK ] Started Remount Root and Kernel File Systems.
Starting Flush Journal to Persistent Storage...
Starting Load/Save Random Seed...
Starting Create Static [ 10.804340] systemd-journald[780]:
Received request to flush runtime journal from PID 1
Device Nodes in /dev...
[ OK ] Started Load/Save Random Seed.
[ OK ] Started Flush Journal to Persistent Storage.
[ OK ] Started Create Static Device Nodes in /dev.
Starting udev Kernel Device Manager...
[ OK ] Reached target Local File Systems (Pre).
Starting File System Check on /dev/disk/by-uuid/7281-17FC...
[ OK ] Started File System Check on /dev/disk/by-uuid/7281-17FC.
Mounting /boot/efi...
[ OK ] Mounted /boo[ 11.203461] systemd[1]: segfault at 2e0 ip
000055b08607cc24 sp 00007ffe13809090 error 4 in
systemd[55b086000000+140000]
t/efi.
[ 11.216088] Code: 02 c7 44 24 10 fe ff ff ff 49 89 e4 89 06 48 8d
6c 24 08 48 8d 5c 24 10 48 c7 44 24 18 00 00 00 00 eb 10 0f 1f 00 48
8b 3c 24 <44> 39 b7 e0 02 00 00 74 3b 49 8b 7d 00 4c 89 e1 48 89 ea 48
89 de
---------------log ends here-----------

> Based on the Keith's comment it needs to be completely avoided in hot path.
>
> Did you get a chance to bisect the problem in the rc6 ? We need to know the
> root cause otherwise it might happen again after we add this patch.

I am now trying to bisect the problem.
I am not sure how the mapping offset could have caused the error.
Any suggestions are appreciated.

--
Jianxiong Gao

2021-02-02 11:34:26

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On Mon, Feb 01, 2021 at 04:25:55PM -0800, Jianxiong Gao wrote:

> + if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))

Side note: we have DMA_BIT_MASK(), please use it.

> + dev_warn(dev->dev, "dma_set_min_align_mask failed to
> set offset\n");

--
With Best Regards,
Andy Shevchenko


2021-02-02 12:13:00

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

On 2021-02-02 11:21, Andy Shevchenko wrote:
> On Mon, Feb 01, 2021 at 04:25:55PM -0800, Jianxiong Gao wrote:
>
>> + if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
>
> Side note: we have DMA_BIT_MASK(), please use it.

FWIW I'd actually disagree on that point. Conceptually, this is a very
different thing from dev->{coherent_}dma_mask. It does not need to
handle everything up to 2^64-1 correctly without overflow issues, and
data alignments typically *are* defined in terms of sizes rather than
numbers of bits.

In fact, it might be neat to just have callers pass a size directly to a
dma_set_min_align() interface which asserts that it is a power of two
and stores it as a mask internally.

Robin.

>
>> + dev_warn(dev->dev, "dma_set_min_align_mask failed to
>> set offset\n");
>

2021-02-03 13:42:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

Please try with this extra patch:

---
From 212764c3c15ce859e6f55d2146f450ea4ca6fdb9 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Wed, 3 Feb 2021 14:27:13 +0100
Subject: nvme-pci: fix 2nd PRP setup in nvme_setup_prp_simple

Use the dma address instead of the bio_vec offset for the arithmetics
to find the address for the 2nd PRP.

Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Reported-by: Jianxiong Gao <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/nvme/host/pci.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 30e45f7e0f750c..4ae51735d72f4a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -808,8 +808,7 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
struct bio_vec *bv)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1);
- unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset;
+ dma_addr_t next_prp;

iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
if (dma_mapping_error(dev->dev, iod->first_dma))
@@ -817,8 +816,9 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
iod->dma_len = bv->bv_len;

cmnd->dptr.prp1 = cpu_to_le64(iod->first_dma);
- if (bv->bv_len > first_prp_len)
- cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma + first_prp_len);
+ next_prp = round_down(iod->first_dma + bv->bv_len, NVME_CTRL_PAGE_SIZE);
+ if (next_prp > iod->first_dma)
+ cmnd->dptr.prp2 = cpu_to_le64(next_prp);
return BLK_STS_OK;
}

--
2.29.2

2021-02-03 16:50:25

by Jianxiong Gao

[permalink] [raw]
Subject: Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

>
> Please try with this extra patch:
>
I have tried with the extra patch and it still fails to boot.
I have attached dmesg output for the error:

-------------dmesg starts here-----------------
[ 6.357755] XFS (nvme0n1p2): Mounting V5 Filesystem
[ 6.430092] XFS (nvme0n1p2): Torn write (CRC failure) detected at
log block 0x20d0. Truncating head block from 0x20e0.
[ 6.442828] XFS (nvme0n1p2): Starting recovery (logdev: internal)
[ 7.272456] XFS (nvme0n1p2): Ending recovery (logdev: internal)
[ 7.610250] systemd-journald[434]: Received SIGTERM from PID 1 (systemd).
...
[ 10.054121] systemd[755]:
/usr/lib/systemd/system-generators/systemd-rc-local-generator
terminated by signal ABRT.
[ 10.726122] audit: type=1400 audit(1612370261.090:4): avc: denied
{ search } for pid=783 comm="systemd-sysctl" name="crash"
dev="nvme0n1p2" ino=50789805
scontext=system_u:system_r:systemd_sysctl_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[ 10.751764] audit: type=1400 audit(1612370261.090:5): avc: denied
{ search } for pid=783 comm="systemd-sysctl" name="crash"
dev="nvme0n1p2" ino=50789805
scontext=system_u:system_r:systemd_sysctl_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[ 12.366607] xfs filesystem being remounted at / supports timestamps
until 2038 (0x7fffffff)
[ 12.376379] audit: type=1400 audit(1612370262.740:6): avc: denied
{ write } for pid=788 comm="systemd-remount" name="crash"
dev="nvme0n1p2" ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[ 12.413155] systemd-journald[781]: Received request to flush
runtime journal from PID 1
[ 12.428917] audit: type=1400 audit(1612370262.793:7): avc: denied
{ write } for pid=815 comm="journalctl" name="crash" dev="nvme0n1p2"
ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[ 12.453006] audit: type=1400 audit(1612370262.799:8): avc: denied
{ write } for pid=817 comm="systemd-random-" name="crash"
dev="nvme0n1p2" ino=50789805 scontext=system_u:system_r:init_t:s0
tcontext=system_u:object_r:kdump_crash_t:s0 tclass=dir permissive=0
[ 13.306030] plymouth[853]: segfault at 0 ip 00007f2eabca8090 sp
00007fffe94d8c08 error 6 in libc-2.28.so[7f2eabbcd000+1b9000]
[ 13.318772] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[ 78.782418] sed[911]: segfault at 0 ip 00007fc3540da090 sp
00007ffdb4373ff8 error 6 in libc-2.28.so[7fc353fff000+1b9000]
[ 78.794007] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
--------------dmesg ends here-----------

--
Jianxiong Gao