2023-07-28 08:37:37

by Jinyoung Choi

[permalink] [raw]
Subject: [PATCH 0/2] multi-page bvec configuration for integrity payload

In the case of NVMe, it has an integrity payload consisting of one segment.
So, rather than configuring SG_LIST, it was changed by direct DMA mapping.

The page-merge is not performed for the struct bio_vec when creating
a integrity payload in block.
As a result, when creating an integrity paylaod beyond one page, each
struct bio_vec is generated, and its bv_len does not exceed the PAGESIZE.

To solve it, bio_integrity_add_page() should just add to the existing
bvec, similar to bio_add_page() and friends.
As the bip configuration changed, the code related to sg_list was
also modified.

(ref: https://lore.kernel.org/linux-nvme/[email protected]/T/#t)


Tested like this:

- Format (support pi)
$ sudo nvme format /dev/nvme2n1 --force -n 1 -i 1 -p 0 -m 0 -l 1 -r

- Run FIO
[global]
ioengine=libaio
group_reporting

[job]
bs=512k
iodepth=256
rw=write
numjobs=8
direct=1
runtime=10s
filename=/dev/nvme2n1

- Result
...
[ 93.496218] nvme2n1: I/O Cmd(0x1) @ LBA 62464, 1024 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
[ 93.496227] protection error, dev nvme2n1, sector 62464 op 0x1:(WRITE) flags 0x18800 phys_seg 3 prio class 2
[ 93.538788] nvme2n1: I/O Cmd(0x1) @ LBA 6144, 1024 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
[ 93.538798] protection error, dev nvme2n1, sector 6144 op 0x1:(WRITE) flags 0x18800 phys_seg 3 prio class 2
[ 93.566231] nvme2n1: I/O Cmd(0x1) @ LBA 124928, 1024 blocks, I/O Error (sct 0x0 / sc 0x4)
[ 93.566241] I/O error, dev nvme2n1, sector 124928 op 0x1:(WRITE) flags 0x18800 phys_seg 3 prio class 2
[ 93.694147] nvme2n1: I/O Cmd(0x1) @ LBA 64512, 1024 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
[ 93.694155] protection error, dev nvme2n1, sector 64512 op 0x1:(WRITE) flags 0x18800 phys_seg 3 prio class 2
[ 93.694299] nvme2n1: I/O Cmd(0x1) @ LBA 5120, 1024 blocks, I/O Error (sct 0x2 / sc 0x82) MORE
[ 93.694305] protection error, dev nvme2n1, sector 5120 op 0x1:(WRITE) flags 0x18800 phys_seg 3 prio class 2
...

Jinyoung Choi (2):
block: make bvec_try_merge_hw_page() non-static
bio-integrity: create multi-page bvecs in bio_integrity_add_page()

block/bio-integrity.c | 50 ++++++++++++++++++++++++-------------------
block/bio.c | 2 +-
block/blk.h | 4 ++++
3 files changed, 33 insertions(+), 23 deletions(-)

--
2.34.1


2023-07-28 09:01:24

by Jinyoung Choi

[permalink] [raw]
Subject: [PATCH 1/2] block: make bvec_try_merge_hw_page() non-static

This will be used for multi-page configuration for integrity payload.

Cc: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>

Signed-off-by: Jinyoung Choi <[email protected]>
---
block/bio.c | 2 +-
block/blk.h | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index c92dda962449..8d1533af7c60 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -934,7 +934,7 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
* size limit. This is not for normal read/write bios, but for passthrough
* or Zone Append operations that we can't split.
*/
-static bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
+bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
struct page *page, unsigned len, unsigned offset,
bool *same_page)
{
diff --git a/block/blk.h b/block/blk.h
index 686712e13835..9d22ec3a53bc 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -75,6 +75,10 @@ struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
gfp_t gfp_mask);
void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs);

+bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
+ struct page *page, unsigned len, unsigned offset,
+ bool *same_page);
+
static inline bool biovec_phys_mergeable(struct request_queue *q,
struct bio_vec *vec1, struct bio_vec *vec2)
{
--
2.34.1

2023-07-28 10:29:11

by Jinyoung Choi

[permalink] [raw]
Subject: [PATCH 2/2] bio-integrity: create multi-page bvecs in bio_integrity_add_page()

Allow bio_integrity_add_page to create multi-page bvecs, just like
the bio payloads. This simplifies adding larger payloads, and fixes
support for non-tiny workloads with nvme, which stopped using
scatterlist for metadata a while ago

Cc: Christoph Hellwig <[email protected]>
Cc: Martin K. Petersen <[email protected]>

Fixes: 783b94bd9250 ("nvme-pci: do not build a scatterlist to map metadata")
Signed-off-by: Jinyoung Choi <[email protected]>
---
block/bio-integrity.c | 50 ++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 045553a164e0..ae054c6a06cb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -123,21 +123,38 @@ void bio_integrity_free(struct bio *bio)
int bio_integrity_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset)
{
+ struct request_queue *q = bdev_get_queue(bio->bi_bdev);
struct bio_integrity_payload *bip = bio_integrity(bio);

- if (bip->bip_vcnt >= bip->bip_max_vcnt) {
- printk(KERN_ERR "%s: bip_vec full\n", __func__);
+ if (((bip->bip_iter.bi_size + len) >> SECTOR_SHIFT) >
+ queue_max_hw_sectors(q))
return 0;
- }

- if (bip->bip_vcnt &&
- bvec_gap_to_prev(&bdev_get_queue(bio->bi_bdev)->limits,
- &bip->bip_vec[bip->bip_vcnt - 1], offset))
- return 0;
+ if (bip->bip_vcnt > 0) {
+ struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
+ bool same_page = false;
+
+ if (bvec_try_merge_hw_page(q, bv, page, len, offset,
+ &same_page)) {
+ bip->bip_iter.bi_size += len;
+ return len;
+ }
+
+ if (bip->bip_vcnt >=
+ min(bip->bip_max_vcnt, queue_max_integrity_segments(q)))
+ return 0;
+
+ /*
+ * If the queue doesn't support SG gaps and adding this segment
+ * would create a gap, disallow it.
+ */
+ if (bvec_gap_to_prev(&q->limits, bv, offset))
+ return 0;
+ }

bvec_set_page(&bip->bip_vec[bip->bip_vcnt], page, len, offset);
bip->bip_vcnt++;
-
+ bip->bip_iter.bi_size += len;
return len;
}
EXPORT_SYMBOL(bio_integrity_add_page);
@@ -244,7 +261,6 @@ bool bio_integrity_prep(struct bio *bio)
}

bip->bip_flags |= BIP_BLOCK_INTEGRITY;
- bip->bip_iter.bi_size = len;
bip_set_seed(bip, bio->bi_iter.bi_sector);

if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
@@ -252,27 +268,18 @@ bool bio_integrity_prep(struct bio *bio)

/* Map it */
offset = offset_in_page(buf);
- for (i = 0 ; i < nr_pages ; i++) {
- int ret;
+ for (i = 0; i < nr_pages && len > 0; i++) {
bytes = PAGE_SIZE - offset;

- if (len <= 0)
- break;
-
if (bytes > len)
bytes = len;

- ret = bio_integrity_add_page(bio, virt_to_page(buf),
- bytes, offset);
-
- if (ret == 0) {
+ if (bio_integrity_add_page(bio, virt_to_page(buf),
+ bytes, offset) < bytes) {
printk(KERN_ERR "could not attach integrity payload\n");
goto err_end_io;
}

- if (ret < bytes)
- break;
-
buf += bytes;
len -= bytes;
offset = 0;
@@ -291,7 +298,6 @@ bool bio_integrity_prep(struct bio *bio)
bio->bi_status = BLK_STS_RESOURCE;
bio_endio(bio);
return false;
-
}
EXPORT_SYMBOL(bio_integrity_prep);

--
2.34.1

2023-07-31 06:05:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/2] bio-integrity: create multi-page bvecs in bio_integrity_add_page()

On Fri, Jul 28, 2023 at 04:57:53PM +0900, Jinyoung Choi wrote:
> Allow bio_integrity_add_page to create multi-page bvecs, just like
> the bio payloads. This simplifies adding larger payloads, and fixes
> support for non-tiny workloads with nvme, which stopped using
> scatterlist for metadata a while ago

Missing dot at the end of the sentence here. Also the commit log feels
very short to me for such a substanial change, although even thinking
hard about it I'm not entirely sure what would be missing, so it's
probably fine..

Looks good to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-07-31 06:45:31

by Christoph Hellwig

[permalink] [raw]

2023-07-31 06:57:30

by Jinyoung Choi

[permalink] [raw]
Subject: RE:(2) [PATCH 2/2] bio-integrity: create multi-page bvecs in bio_integrity_add_page()

> On Fri, Jul 28, 2023 at 04:57:53PM +0900, Jinyoung Choi wrote:
> > Allow bio_integrity_add_page to create multi-page bvecs, just like
> > the bio payloads. This simplifies adding larger payloads, and fixes
> > support for non-tiny workloads with nvme, which stopped using
> > scatterlist for metadata a while ago
>
> Missing dot at the end of the sentence here.  Also the commit log feels
> very short to me for such a substanial change, although even thinking
> hard about it I'm not entirely sure what would be missing, so it's
> probably fine..
>
> Looks good to me:
>
> Reviewed-by: Christoph Hellwig <[email protected]>

Hi, Christoph.

I will add a commit message for the substanial change.

Oh, and I missed some patches that should be included in the patch set.
In the next version, I will add patches that are affected by this patch
and must be changed.

Thank you for your review.

Best Regards,
Jinyoung.