2022-01-24 19:31:40

by Keith Busch

[permalink] [raw]
Subject: [RFC 0/7] 64-bit data integrity field support

The NVM Express protocol added enhancements to the data integrity field
formats beyond the T10 defined protection information. A detailed
description of the new formats can be found in the NVMe's NVM Command
Set Specification, section 5.2, available at:

https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf

This series implements one possible new format: the CRC64 guard with
48-bit reference tags. This does not add support for the variable
"storage tag" field.

The NVMe CRC64 parameters (from Rocksoft) were not implemented in the
kernel, so a software implementation is included in this series based on
the generated table. This series does not include any possible hardware
excelleration (ex: x86's pclmulqdq), so it's not very high performant
right now.

Keith Busch (7):
block: support pi with extended metadata
nvme: allow integrity on extended metadata formats
lib: add rocksoft model crc64
lib: add crc64 tests
asm-generic: introduce be48 unaligned accessors
block: add pi for nvme enhanced integrity
nvme: add support for enhanced metadata

block/Kconfig | 1 +
block/bio-integrity.c | 1 +
block/t10-pi.c | 198 +++++++++++++++++++++++++++++++-
drivers/nvme/host/core.c | 167 ++++++++++++++++++++++-----
drivers/nvme/host/nvme.h | 1 +
include/asm-generic/unaligned.h | 26 +++++
include/linux/blk-integrity.h | 1 +
include/linux/crc64.h | 2 +
include/linux/nvme.h | 53 ++++++++-
include/linux/t10-pi.h | 20 ++++
lib/Kconfig.debug | 3 +
lib/Makefile | 1 +
lib/crc64.c | 79 +++++++++++++
lib/gen_crc64table.c | 33 ++++--
lib/test_crc64.c | 68 +++++++++++
15 files changed, 608 insertions(+), 46 deletions(-)
create mode 100644 lib/test_crc64.c

--
2.25.4


2022-01-24 19:31:40

by Keith Busch

[permalink] [raw]
Subject: [RFC 6/7] block: add pi for nvme enhanced integrity

The NVMe specification defines larger data integrity formats beyond the
t10 tuple. Add support for the specification defined CRC64 formats,
assuming the reference tag does not need to be split with the "storage
tag".

Cc: "Martin K. Petersen" <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
block/Kconfig | 1 +
block/t10-pi.c | 194 +++++++++++++++++++++++++++++++++++++++++
include/linux/t10-pi.h | 20 +++++
3 files changed, 215 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index d5d4197b7ed2..78eca050dff7 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -63,6 +63,7 @@ config BLK_DEV_INTEGRITY_T10
tristate
depends on BLK_DEV_INTEGRITY
select CRC_T10DIF
+ select CRC64

config BLK_DEV_ZONED
bool "Zoned block device support"
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 758a76518854..7bfefe970bc5 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -7,8 +7,10 @@
#include <linux/t10-pi.h>
#include <linux/blk-integrity.h>
#include <linux/crc-t10dif.h>
+#include <linux/crc64.h>
#include <linux/module.h>
#include <net/checksum.h>
+#include <asm/unaligned.h>

typedef __be16 (csum_fn) (void *, unsigned int);

@@ -278,4 +280,196 @@ const struct blk_integrity_profile t10_pi_type3_ip = {
};
EXPORT_SYMBOL(t10_pi_type3_ip);

+static __be64 nvme_pi_crc64(void *data, unsigned int len)
+{
+ return cpu_to_be64(crc64_rocksoft(~0ULL, data, len));
+}
+
+static blk_status_t nvme_crc64_generate(struct blk_integrity_iter *iter,
+ enum t10_dif_type type)
+{
+ unsigned int i;
+
+ for (i = 0 ; i < iter->data_size ; i += iter->interval) {
+ struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+
+ pi->guard_tag = nvme_pi_crc64(iter->data_buf, iter->interval);
+ pi->app_tag = 0;
+
+ if (type == T10_PI_TYPE1_PROTECTION)
+ put_unaligned_be48(iter->seed, pi->ref_tag);
+ else
+ put_unaligned_be48(0ULL, pi->ref_tag);
+
+ iter->data_buf += iter->interval;
+ iter->prot_buf += iter->tuple_size;
+ iter->seed++;
+ }
+
+ return BLK_STS_OK;
+}
+
+static bool nvme_crc64_ref_escape(u8 *ref_tag)
+{
+ static u8 ref_escape[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+ return memcmp(ref_tag, ref_escape, sizeof(ref_escape)) == 0;
+}
+
+static blk_status_t nvme_crc64_verify(struct blk_integrity_iter *iter,
+ enum t10_dif_type type)
+{
+ unsigned int i;
+
+ for (i = 0; i < iter->data_size; i += iter->interval) {
+ struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+ u64 ref, seed;
+ __be64 csum;
+
+ if (type == T10_PI_TYPE1_PROTECTION) {
+ if (pi->app_tag == T10_PI_APP_ESCAPE)
+ goto next;
+
+ ref = get_unaligned_be48(pi->ref_tag);
+ seed = iter->seed & 0xffffffffffffull;
+ if (ref != seed) {
+ pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
+ iter->disk_name, seed, ref);
+ return BLK_STS_PROTECTION;
+ }
+ } else if (type == T10_PI_TYPE3_PROTECTION) {
+ if (pi->app_tag == T10_PI_APP_ESCAPE &&
+ nvme_crc64_ref_escape(pi->ref_tag))
+ goto next;
+ }
+
+ csum = nvme_pi_crc64(iter->data_buf, iter->interval);
+ if (pi->guard_tag != csum) {
+ pr_err("%s: guard tag error at sector %llu " \
+ "(rcvd %016llx, want %016llx)\n",
+ iter->disk_name, (unsigned long long)iter->seed,
+ be64_to_cpu(pi->guard_tag), be64_to_cpu(csum));
+ return BLK_STS_PROTECTION;
+ }
+
+next:
+ iter->data_buf += iter->interval;
+ iter->prot_buf += iter->tuple_size;
+ iter->seed++;
+ }
+
+ return BLK_STS_OK;
+}
+
+static blk_status_t nvme_pi_type1_verify_crc(struct blk_integrity_iter *iter)
+{
+ return nvme_crc64_verify(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type1_generate_crc(struct blk_integrity_iter *iter)
+{
+ return nvme_crc64_generate(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static void nvme_pi_type1_prepare(struct request *rq)
+{
+ const int tuple_sz = rq->q->integrity.tuple_size;
+ u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+ struct bio *bio;
+
+ __rq_for_each_bio(bio, rq) {
+ struct bio_integrity_payload *bip = bio_integrity(bio);
+ u64 virt = bip_get_seed(bip) & 0xffffffffffffull;
+ struct bio_vec iv;
+ struct bvec_iter iter;
+
+ /* Already remapped? */
+ if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
+ break;
+
+ bip_for_each_vec(iv, bip, iter) {
+ unsigned int j;
+ void *p;
+
+ p = bvec_kmap_local(&iv);
+ for (j = 0; j < iv.bv_len; j += tuple_sz) {
+ struct nvme_crc64_pi_tuple *pi = p;
+ u64 ref = get_unaligned_be48(pi->ref_tag);
+
+ if (ref == virt)
+ put_unaligned_be48(ref_tag, pi->ref_tag);
+ virt++;
+ ref_tag++;
+ p += tuple_sz;
+ }
+ kunmap_local(p);
+ }
+
+ bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+ }
+}
+
+static void nvme_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
+{
+ unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
+ const int tuple_sz = rq->q->integrity.tuple_size;
+ u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+ struct bio *bio;
+
+ __rq_for_each_bio(bio, rq) {
+ struct bio_integrity_payload *bip = bio_integrity(bio);
+ u64 virt = bip_get_seed(bip) & 0xffffffffffffull;
+ struct bio_vec iv;
+ struct bvec_iter iter;
+
+ bip_for_each_vec(iv, bip, iter) {
+ unsigned int j;
+ void *p;
+
+ p = bvec_kmap_local(&iv);
+ for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
+ struct nvme_crc64_pi_tuple *pi = p;
+ u64 ref = get_unaligned_be48(pi->ref_tag);
+
+ if (ref == ref_tag)
+ put_unaligned_be48(virt, pi->ref_tag);
+ virt++;
+ ref_tag++;
+ intervals--;
+ p += tuple_sz;
+ }
+ kunmap_local(p);
+ }
+ }
+}
+
+static blk_status_t nvme_pi_type3_verify_crc(struct blk_integrity_iter *iter)
+{
+ return nvme_crc64_verify(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type3_generate_crc(struct blk_integrity_iter *iter)
+{
+ return nvme_crc64_generate(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+const struct blk_integrity_profile nvme_pi_type1_crc64 = {
+ .name = "NVME-DIF-TYPE1-CRC64",
+ .generate_fn = nvme_pi_type1_generate_crc,
+ .verify_fn = nvme_pi_type1_verify_crc,
+ .prepare_fn = nvme_pi_type1_prepare,
+ .complete_fn = nvme_pi_type1_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type1_crc64);
+
+const struct blk_integrity_profile nvme_pi_type3_crc64 = {
+ .name = "NVME-DIF-TYPE3-CRC64",
+ .generate_fn = nvme_pi_type3_generate_crc,
+ .verify_fn = nvme_pi_type3_verify_crc,
+ .prepare_fn = t10_pi_type3_prepare,
+ .complete_fn = t10_pi_type3_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type3_crc64);
+
+MODULE_LICENSE("GPL");
MODULE_LICENSE("GPL");
diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h
index c635c2e014e3..86bc8713ca09 100644
--- a/include/linux/t10-pi.h
+++ b/include/linux/t10-pi.h
@@ -53,4 +53,24 @@ extern const struct blk_integrity_profile t10_pi_type1_ip;
extern const struct blk_integrity_profile t10_pi_type3_crc;
extern const struct blk_integrity_profile t10_pi_type3_ip;

+struct nvme_crc64_pi_tuple {
+ __be64 guard_tag;
+ __be16 app_tag;
+ __u8 ref_tag[6];
+};
+
+static inline u64 nvme_pi_extended_ref_tag(struct request *rq)
+{
+ unsigned int shift = ilog2(queue_logical_block_size(rq->q));
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+ if (rq->q->integrity.interval_exp)
+ shift = rq->q->integrity.interval_exp;
+#endif
+ return blk_rq_pos(rq) >> (shift - SECTOR_SHIFT) & 0xffffffffffffull;
+}
+
+extern const struct blk_integrity_profile nvme_pi_type1_crc64;
+extern const struct blk_integrity_profile nvme_pi_type3_crc64;
+
#endif
--
2.25.4

2022-01-24 19:31:41

by Keith Busch

[permalink] [raw]
Subject: [RFC 7/7] nvme: add support for enhanced metadata

NVM Express ratified TP 4069 defines new protection information formats.
Implement support for the CRC64 guard tags.

Since the block layer doesn't support variable length reference tags,
driver support for the Storage Tag space is not supported at this time.

Signed-off-by: Keith Busch <[email protected]>
---
drivers/nvme/host/core.c | 164 +++++++++++++++++++++++++++++++++------
drivers/nvme/host/nvme.h | 1 +
include/linux/nvme.h | 53 +++++++++++--
3 files changed, 188 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b3eabf6a08b9..65534ed646e4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -882,6 +882,30 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
return BLK_STS_OK;
}

+static inline void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
+ struct request *req)
+{
+ u32 upper, lower;
+ u64 ref48;
+
+ /* both rw and write zeroes share the same reftag format */
+ switch (ns->guard_type) {
+ case NVME_NVM_NS_16B_GUARD:
+ cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+ break;
+ case NVME_NVM_NS_64B_GUARD:
+ ref48 = nvme_pi_extended_ref_tag(req);
+ lower = lower_32_bits(ref48);
+ upper = upper_32_bits(ref48);
+
+ cmnd->rw.reftag = cpu_to_le32(lower);
+ cmnd->rw.cdw3 = cpu_to_le32(upper);
+ break;
+ default:
+ break;
+ }
+}
+
static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
struct request *req, struct nvme_command *cmnd)
{
@@ -903,8 +927,7 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
switch (ns->pi_type) {
case NVME_NS_DPS_PI_TYPE1:
case NVME_NS_DPS_PI_TYPE2:
- cmnd->write_zeroes.reftag =
- cpu_to_le32(t10_pi_ref_tag(req));
+ nvme_set_ref_tag(ns, cmnd, req);
break;
}
}
@@ -931,7 +954,8 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
cmnd->rw.opcode = op;
cmnd->rw.flags = 0;
cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
- cmnd->rw.rsvd2 = 0;
+ cmnd->rw.cdw2 = 0;
+ cmnd->rw.cdw3 = 0;
cmnd->rw.metadata = 0;
cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
@@ -965,7 +989,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
NVME_RW_PRINFO_PRCHK_REF;
if (op == nvme_cmd_zone_append)
control |= NVME_RW_APPEND_PIREMAP;
- cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+ nvme_set_ref_tag(ns, cmnd, req);
break;
}
}
@@ -1619,33 +1643,58 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
}

#ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
u32 max_integrity_segments)
{
struct blk_integrity integrity = { };

- switch (pi_type) {
+ switch (ns->pi_type) {
case NVME_NS_DPS_PI_TYPE3:
- integrity.profile = &t10_pi_type3_crc;
- integrity.tag_size = sizeof(u16) + sizeof(u32);
- integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ switch (ns->guard_type) {
+ case NVME_NVM_NS_16B_GUARD:
+ integrity.profile = &t10_pi_type3_crc;
+ integrity.tag_size = sizeof(u16) + sizeof(u32);
+ integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ break;
+ case NVME_NVM_NS_64B_GUARD:
+ integrity.profile = &nvme_pi_type1_crc64;
+ integrity.tag_size = sizeof(u16) + 6;
+ integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ break;
+ default:
+ integrity.profile = NULL;
+ break;
+ }
break;
case NVME_NS_DPS_PI_TYPE1:
case NVME_NS_DPS_PI_TYPE2:
- integrity.profile = &t10_pi_type1_crc;
- integrity.tag_size = sizeof(u16);
- integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ switch (ns->guard_type) {
+ case NVME_NVM_NS_16B_GUARD:
+ integrity.profile = &t10_pi_type1_crc;
+ integrity.tag_size = sizeof(u16);
+ integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ break;
+ case NVME_NVM_NS_64B_GUARD:
+ integrity.profile = &nvme_pi_type1_crc64;
+ integrity.tag_size = sizeof(u16);
+ integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+ break;
+ default:
+ integrity.profile = NULL;
+ break;
+ }
break;
default:
integrity.profile = NULL;
break;
}
- integrity.tuple_size = ms;
+
+ integrity.tuple_size = ns->ms;
blk_integrity_register(disk, &integrity);
blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
}
#else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
u32 max_integrity_segments)
{
}
@@ -1722,17 +1771,75 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
return 0;
}

-static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id)
{
+ bool first = id->dps & NVME_NS_DPS_PI_FIRST;
+ unsigned lbaf = nvme_lbaf_index(id->flbas);
struct nvme_ctrl *ctrl = ns->ctrl;
+ struct nvme_command c = { };
+ struct nvme_id_ns_nvm *nvm;
+ u8 pi_size = 0;
+ int ret = 0;
+ u32 elbaf;
+
+ ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+ if (!(ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
+ pi_size = sizeof(struct t10_pi_tuple);
+ ns->guard_type = NVME_NVM_NS_16B_GUARD;
+ goto set_pi;
+ }
+
+ nvm = kzalloc(sizeof(*nvm), GFP_KERNEL);
+ if (!nvm)
+ return -ENOMEM;

- ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
- if (id->dps & NVME_NS_DPS_PI_FIRST ||
- ns->ms == sizeof(struct t10_pi_tuple))
+ c.identify.opcode = nvme_admin_identify;
+ c.identify.nsid = cpu_to_le32(ns->head->ns_id);
+ c.identify.cns = NVME_ID_CNS_CS_NS;
+ c.identify.csi = NVME_CSI_NVM;
+
+ ret = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, nvm, sizeof(*nvm));
+ if (ret)
+ goto free_data;
+
+ elbaf = cpu_to_le32(nvm->elbaf[lbaf]);
+
+ /* no support for storage tag formats right now */
+ if (nvme_elbaf_sts(elbaf))
+ goto free_data;
+
+ ns->guard_type = nvme_elbaf_guard_type(elbaf);
+ switch (ns->guard_type) {
+ case NVME_NVM_NS_64B_GUARD:
+ pi_size = sizeof(struct nvme_crc64_pi_tuple);
+ break;
+ case NVME_NVM_NS_16B_GUARD:
+ pi_size = sizeof(struct t10_pi_tuple);
+ break;
+ default:
+ break;
+ }
+
+free_data:
+ kfree(nvm);
+set_pi:
+ if (pi_size && (first || ns->ms == pi_size))
ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
else
ns->pi_type = 0;

+ return ret;
+}
+
+static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+ struct nvme_ctrl *ctrl = ns->ctrl;
+ int ret;
+
+ ret = nvme_init_ms(ns, id);
+ if (ret)
+ return ret;
+
ns->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS);
if (!ns->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
return 0;
@@ -1850,7 +1957,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
if (ns->ms) {
if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
(ns->features & NVME_NS_METADATA_SUPPORTED))
- nvme_init_integrity(disk, ns->ms, ns->pi_type,
+ nvme_init_integrity(disk, ns,
ns->ctrl->max_integrity_segments);
else if (!nvme_ns_has_pi(ns))
capacity = 0;
@@ -1905,7 +2012,7 @@ static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)

static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
{
- unsigned lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
+ unsigned lbaf = nvme_lbaf_index(id->flbas);
int ret;

blk_mq_freeze_queue(ns->disk->queue);
@@ -2252,20 +2359,27 @@ static int nvme_configure_timestamp(struct nvme_ctrl *ctrl)
return ret;
}

-static int nvme_configure_acre(struct nvme_ctrl *ctrl)
+static int nvme_configure_host_options(struct nvme_ctrl *ctrl)
{
struct nvme_feat_host_behavior *host;
+ u8 acre = 0, lbafee = 0;
int ret;

/* Don't bother enabling the feature if retry delay is not reported */
- if (!ctrl->crdt[0])
+ if (ctrl->crdt[0])
+ acre = NVME_ENABLE_ACRE;
+ if (ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)
+ lbafee = NVME_ENABLE_LBAFEE;
+
+ if (!acre && !lbafee)
return 0;

host = kzalloc(sizeof(*host), GFP_KERNEL);
if (!host)
return 0;

- host->acre = NVME_ENABLE_ACRE;
+ host->acre = acre;
+ host->lbafee = lbafee;
ret = nvme_set_features(ctrl, NVME_FEAT_HOST_BEHAVIOR, 0,
host, sizeof(*host), NULL);
kfree(host);
@@ -3104,7 +3218,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
if (ret < 0)
return ret;

- ret = nvme_configure_acre(ctrl);
+ ret = nvme_configure_host_options(ctrl);
if (ret < 0)
return ret;

@@ -4725,12 +4839,14 @@ static inline void _nvme_check_size(void)
BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct nvme_id_ns) != NVME_IDENTIFY_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct nvme_id_ns_zns) != NVME_IDENTIFY_DATA_SIZE);
+ BUILD_BUG_ON(sizeof(struct nvme_id_ns_nvm) != NVME_IDENTIFY_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_zns) != NVME_IDENTIFY_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_nvm) != NVME_IDENTIFY_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct nvme_lba_range_type) != 64);
BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
BUILD_BUG_ON(sizeof(struct nvme_directive_cmd) != 64);
+ BUILD_BUG_ON(sizeof(struct nvme_feat_host_behavior) != 512);
}


diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..175a2a5fbd73 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -455,6 +455,7 @@ struct nvme_ns {
u16 sgs;
u32 sws;
u8 pi_type;
+ u8 guard_type;
#ifdef CONFIG_BLK_DEV_ZONED
u64 zsze;
#endif
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 855dd9b3e84b..4342b7eed3e2 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -238,6 +238,7 @@ enum {
enum nvme_ctrl_attr {
NVME_CTRL_ATTR_HID_128_BIT = (1 << 0),
NVME_CTRL_ATTR_TBKAS = (1 << 6),
+ NVME_CTRL_ATTR_ELBAS = (1 << 15),
};

struct nvme_id_ctrl {
@@ -391,8 +392,7 @@ struct nvme_id_ns {
__le16 endgid;
__u8 nguid[16];
__u8 eui64[8];
- struct nvme_lbaf lbaf[16];
- __u8 rsvd192[192];
+ struct nvme_lbaf lbaf[64];
__u8 vs[3712];
};

@@ -410,8 +410,7 @@ struct nvme_id_ns_zns {
__le32 rrl;
__le32 frl;
__u8 rsvd20[2796];
- struct nvme_zns_lbafe lbafe[16];
- __u8 rsvd3072[768];
+ struct nvme_zns_lbafe lbafe[64];
__u8 vs[256];
};

@@ -420,6 +419,30 @@ struct nvme_id_ctrl_zns {
__u8 rsvd1[4095];
};

+struct nvme_id_ns_nvm {
+ __le64 lbstm;
+ __u8 pic;
+ __u8 rsvd9[3];
+ __le32 elbaf[64];
+ __u8 rsvd268[3828];
+};
+
+enum {
+ NVME_ID_NS_NVM_STS_MASK = 0x3f,
+ NVME_ID_NS_NVM_GUARD_SHIFT = 7,
+ NVME_ID_NS_NVM_GUARD_MASK = 0x3,
+};
+
+static inline __u8 nvme_elbaf_sts(__u32 elbaf)
+{
+ return elbaf & NVME_ID_NS_NVM_STS_MASK;
+}
+
+static inline __u8 nvme_elbaf_guard_type(__u32 elbaf)
+{
+ return (elbaf >> NVME_ID_NS_NVM_GUARD_SHIFT) & NVME_ID_NS_NVM_GUARD_MASK;
+}
+
struct nvme_id_ctrl_nvm {
__u8 vsl;
__u8 wzsl;
@@ -470,6 +493,8 @@ enum {
NVME_NS_FEAT_IO_OPT = 1 << 4,
NVME_NS_ATTR_RO = 1 << 0,
NVME_NS_FLBAS_LBA_MASK = 0xf,
+ NVME_NS_FLBAS_LBA_UMASK = 0x60,
+ NVME_NS_FLBAS_LBA_SHIFT = 1,
NVME_NS_FLBAS_META_EXT = 0x10,
NVME_NS_NMIC_SHARED = 1 << 0,
NVME_LBAF_RP_BEST = 0,
@@ -488,6 +513,18 @@ enum {
NVME_NS_DPS_PI_TYPE3 = 3,
};

+enum {
+ NVME_NVM_NS_16B_GUARD = 0,
+ NVME_NVM_NS_32B_GUARD = 1,
+ NVME_NVM_NS_64B_GUARD = 2,
+};
+
+static inline __u8 nvme_lbaf_index(__u8 flbas)
+{
+ return (flbas & NVME_NS_FLBAS_LBA_MASK) |
+ ((flbas & NVME_NS_FLBAS_LBA_UMASK) >> NVME_NS_FLBAS_LBA_SHIFT);
+}
+
/* Identify Namespace Metadata Capabilities (MC): */
enum {
NVME_MC_EXTENDED_LBA = (1 << 0),
@@ -834,7 +871,8 @@ struct nvme_rw_command {
__u8 flags;
__u16 command_id;
__le32 nsid;
- __u64 rsvd2;
+ __le32 cdw2;
+ __le32 cdw3;
__le64 metadata;
union nvme_data_ptr dptr;
__le64 slba;
@@ -988,11 +1026,14 @@ enum {

struct nvme_feat_host_behavior {
__u8 acre;
- __u8 resv1[511];
+ __u8 etdas;
+ __u8 lbafee;
+ __u8 resv1[509];
};

enum {
NVME_ENABLE_ACRE = 1,
+ NVME_ENABLE_LBAFEE = 1,
};

/* Admin commands */
--
2.25.4

2022-01-26 21:35:03

by Klaus Jensen

[permalink] [raw]
Subject: Re: [RFC 0/7] 64-bit data integrity field support

On Jan 24 08:01, Keith Busch wrote:
> The NVM Express protocol added enhancements to the data integrity field
> formats beyond the T10 defined protection information. A detailed
> description of the new formats can be found in the NVMe's NVM Command
> Set Specification, section 5.2, available at:
>
> https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf
>
> This series implements one possible new format: the CRC64 guard with
> 48-bit reference tags. This does not add support for the variable
> "storage tag" field.
>
> The NVMe CRC64 parameters (from Rocksoft) were not implemented in the
> kernel, so a software implementation is included in this series based on
> the generated table. This series does not include any possible hardware
> excelleration (ex: x86's pclmulqdq), so it's not very high performant
> right now.
>

Hi Keith,

Tested this on QEMU and (assuming we didnt implement the same bugs) it
looks good functionally for separate metadata. However, it should also
be able to support PRACT (i.e. pi strip/insert device-side) if
nvme_ns_has_pi() is updated to also match on the 16 byte pi tuple. I
made it work by just hitting it with a hammer and changing the
comparison to hard-coded 16 bytes, but it should of course handle both
cases.

Naveen and I will post the emulated implementation (that certainly isnt
very high performant either) on qemu-block ASAP if others are interested
in giving this a spin without having hardware available.


Attachments:
(No filename) (1.51 kB)
signature.asc (499.00 B)
Download all attachments

2022-01-26 22:19:10

by Keith Busch

[permalink] [raw]
Subject: Re: [RFC 0/7] 64-bit data integrity field support

On Wed, Jan 26, 2022 at 03:38:13PM +0100, Klaus Jensen wrote:
> On Jan 24 08:01, Keith Busch wrote:
> > The NVM Express protocol added enhancements to the data integrity field
> > formats beyond the T10 defined protection information. A detailed
> > description of the new formats can be found in the NVMe's NVM Command
> > Set Specification, section 5.2, available at:
> >
> > https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf
> >
> > This series implements one possible new format: the CRC64 guard with
> > 48-bit reference tags. This does not add support for the variable
> > "storage tag" field.
> >
> > The NVMe CRC64 parameters (from Rocksoft) were not implemented in the
> > kernel, so a software implementation is included in this series based on
> > the generated table. This series does not include any possible hardware
> > excelleration (ex: x86's pclmulqdq), so it's not very high performant
> > right now.
> >
>
> Hi Keith,
>
> Tested this on QEMU and (assuming we didnt implement the same bugs) it
> looks good functionally for separate metadata. However, it should also
> be able to support PRACT (i.e. pi strip/insert device-side) if
> nvme_ns_has_pi() is updated to also match on the 16 byte pi tuple. I
> made it work by just hitting it with a hammer and changing the
> comparison to hard-coded 16 bytes, but it should of course handle both
> cases.

Thanks for checking with the qemu device!

I'll add the PRACT support for the next version, but will wait till next
week to post it in case there's more feedback to consider.

> Naveen and I will post the emulated implementation (that certainly isnt
> very high performant either) on qemu-block ASAP if others are interested
> in giving this a spin without having hardware available.

There are more features with this TP than are implemented here. I'm just
enabling a product that only supports the 64-bit CRC with 0 STS, but I'm
assuming your QEMU implementation may be more feature complete. If
anyone is interested in 32-bit CRC or >0 STS, there's more work to
follow on from here, but I currently don't have such a target to test
against.