This patchset adds pblk support for variable metadata sizes on the out
of bound area.
Igor, Marcin, you sent a patch some time ago implementing a similar
functionality, but it never got re-posted after review. Can you test
this and see if it works for your case? I did not implement support for
outlined metadata for the case where the reported OOB area is 0. It
should be pretty simple to do that on top of these patches though.
Thanks,
Javier
Javier González (3):
lightnvm: use internal allocation for chunk log page
lightnvm: do no update csecs and sos on 1.2
lightnvm: pblk: support variable OOB size
drivers/lightnvm/pblk-core.c | 60 ++++++++++++++++++++++++++++++----------
drivers/lightnvm/pblk-init.c | 16 ++++++++++-
drivers/lightnvm/pblk-map.c | 19 +++++++++----
drivers/lightnvm/pblk-read.c | 55 ++++++++++++++++++++++--------------
drivers/lightnvm/pblk-recovery.c | 34 ++++++++++++++++-------
drivers/lightnvm/pblk.h | 18 +++++++++---
drivers/nvme/host/lightnvm.c | 26 +++++++++++------
7 files changed, 164 insertions(+), 64 deletions(-)
--
2.7.4
In the OCSSD 2.0 spec., the sector and metadata sizes are reported though
the standard nvme identify command. Thus, the lightnvm subsystem needs
to update this information on the geometry structure on bootup.
Since 1.2 devices report these values on the OCSSD geometry identify,
avoid this update is it is unnecessary and can also corrupt the geometry
if the devices does not report the nvme sizes correctly (which is not
required by the OCSSD 1.2 spec either)
Signed-off-by: Javier González <[email protected]>
---
drivers/nvme/host/lightnvm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 5bfa354c5dd5..33ed09f8410e 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -980,6 +980,9 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
struct nvm_dev *ndev = ns->ndev;
struct nvm_geo *geo = &ndev->geo;
+ if (geo->version == NVM_OCSSD_SPEC_12)
+ return;
+
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
}
--
2.7.4
pblk uses 8 bytes in the metadata region area exposed by the device
through the out of band area to store the lba mapped to the given
physical sector. This is used for recovery purposes. Given that the
first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
structure for this purpose.
This patch relaxes the 16 bytes assumption and uses the metadata size
reported by the device to layout metadata appropriately for the vector
commands. This adds support for arbitrary metadata sizes, as long as
these are larger than 8 bytes. Note that this patch does not address the
case in which the device does not expose an out of band area and that
pblk creation will fail in this case.
Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
drivers/lightnvm/pblk-init.c | 14 ++++++++++
drivers/lightnvm/pblk-map.c | 19 +++++++++-----
drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
drivers/lightnvm/pblk.h | 18 ++++++++++---
6 files changed, 143 insertions(+), 53 deletions(-)
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index a311cc29afd8..d52e0047ae9d 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
if (!is_vector)
return 0;
- rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
- rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+ if (pblk->dma_shared) {
+ rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
+ rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
+
+ return 0;
+ }
+
+ rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
+ &rqd->dma_ppa_list);
+ if (!rqd->ppa_list) {
+ nvm_dev_dma_free(dev->parent, rqd->meta_list,
+ rqd->dma_meta_list);
+ return -ENOMEM;
+ }
return 0;
}
@@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
if (rqd->meta_list)
nvm_dev_dma_free(dev->parent, rqd->meta_list,
- rqd->dma_meta_list);
+ rqd->dma_meta_list);
+
+ if (!pblk->dma_shared && rqd->ppa_list)
+ nvm_dev_dma_free(dev->parent, rqd->ppa_list,
+ rqd->dma_ppa_list);
}
/* Caller must guarantee that the request is a valid type */
@@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
rqd.is_seq = 1;
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
- struct pblk_sec_meta *meta_list = rqd.meta_list;
+ struct pblk_sec_meta *meta;
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
- meta_list[i].lba = lba_list[paddr] = addr_empty;
+
+ meta = sec_meta_index(pblk, rqd.meta_list, i);
+ meta->lba = lba_list[paddr] = addr_empty;
}
ret = pblk_submit_io_sync_sem(pblk, &rqd);
@@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (!meta_list)
return -ENOMEM;
- ppa_list = meta_list + pblk_dma_meta_size;
- dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+ if (pblk->dma_shared) {
+ ppa_list = meta_list + pblk->dma_meta_size;
+ dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
+ } else {
+ ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &dma_ppa_list);
+ if (!ppa_list) {
+ ret = -ENOMEM;
+ goto free_meta_list;
+ }
+ }
next_rq:
memset(&rqd, 0, sizeof(struct nvm_rq));
@@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
l_mg->emeta_alloc_type, GFP_KERNEL);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
- goto free_rqd_dma;
+ goto free_ppa_list;
}
bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (pblk_boundary_paddr_checks(pblk, paddr)) {
bio_put(bio);
ret = -EINTR;
- goto free_rqd_dma;
+ goto free_ppa_list;
}
ppa = addr_to_gen_ppa(pblk, paddr, line_id);
@@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
bio_put(bio);
ret = -EINTR;
- goto free_rqd_dma;
+ goto free_ppa_list;
}
for (j = 0; j < min; j++, i++, paddr++)
@@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (ret) {
pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
bio_put(bio);
- goto free_rqd_dma;
+ goto free_ppa_list;
}
atomic_dec(&pblk->inflight_io);
@@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (left_ppas)
goto next_rq;
-free_rqd_dma:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+free_ppa_list:
+ if (!pblk->dma_shared)
+ nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
+free_meta_list:
+ nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
return ret;
}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index a99854439224..57972156c318 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
struct nvm_geo *geo = &dev->geo;
int ret, max_write_ppas;
+ if (sizeof(struct pblk_sec_meta) > geo->sos) {
+ pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
+ (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
+ return -EINTR;
+ }
+
+ pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
+ pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
+
+ if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
+ pblk->dma_shared = false;
+ else
+ pblk->dma_shared = true;
+
atomic64_set(&pblk->user_wa, 0);
atomic64_set(&pblk->pad_wa, 0);
atomic64_set(&pblk->gc_wa, 0);
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index dc0efb852475..55fca16d18e4 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
unsigned int valid_secs)
{
struct pblk_line *line = pblk_line_get_data(pblk);
+ struct pblk_sec_meta *meta;
struct pblk_emeta *emeta;
struct pblk_w_ctx *w_ctx;
__le64 *lba_list;
@@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
/* ppa to be sent to the device */
ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+ meta = sec_meta_index(pblk, meta_list, i);
+
/* Write context for target bio completion on write buffer. Note
* that the write buffer is protected by the sync backpointer,
* and a single writer thread have access to each specific entry
@@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
kref_get(&line->ref);
w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
- meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+ meta->lba = cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(&pblk->pad_wa);
} else {
- lba_list[paddr] = meta_list[i].lba = addr_empty;
+ lba_list[paddr] = meta->lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
unsigned long *lun_bitmap, unsigned int valid_secs,
unsigned int off)
{
- struct pblk_sec_meta *meta_list = rqd->meta_list;
+ struct pblk_sec_meta *meta_list;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
+ meta_list = sec_meta_index(pblk, rqd->meta_list, i);
+
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
- lun_bitmap, &meta_list[i], map_secs)) {
+ lun_bitmap, meta_list, map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
@@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_line_meta *lm = &pblk->lm;
- struct pblk_sec_meta *meta_list = rqd->meta_list;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
+ struct pblk_sec_meta *meta_list;
struct pblk_line *e_line, *d_line;
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
for (i = 0; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
+ meta_list = sec_meta_index(pblk, rqd->meta_list, i);
+
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
- lun_bitmap, &meta_list[i], map_secs)) {
+ lun_bitmap, meta_list, map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 57d3155ef9a5..12b690e2abd9 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct bio *bio, sector_t blba,
unsigned long *read_bitmap)
{
- struct pblk_sec_meta *meta_list = rqd->meta_list;
struct ppa_addr ppas[NVM_MAX_VLBA];
int nr_secs = rqd->nr_ppas;
bool advanced_bio = false;
@@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
for (i = 0; i < nr_secs; i++) {
+ struct pblk_sec_meta *meta;
struct ppa_addr p = ppas[i];
sector_t lba = blba + i;
+ meta = sec_meta_index(pblk, rqd->meta_list, i);
retry:
if (pblk_ppa_empty(p)) {
WARN_ON(test_and_set_bit(i, read_bitmap));
- meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
+
+ meta->lba = cpu_to_le64(ADDR_EMPTY);
if (unlikely(!advanced_bio)) {
bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
@@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
goto retry;
}
WARN_ON(test_and_set_bit(i, read_bitmap));
- meta_list[i].lba = cpu_to_le64(lba);
+ meta->lba = cpu_to_le64(lba);
advanced_bio = true;
#ifdef CONFIG_NVM_PBLK_DEBUG
atomic_long_inc(&pblk->cache_reads);
@@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
sector_t blba)
{
- struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
int nr_lbas = rqd->nr_ppas;
int i;
for (i = 0; i < nr_lbas; i++) {
- u64 lba = le64_to_cpu(meta_lba_list[i].lba);
+ struct pblk_sec_meta *meta;
+ u64 lba;
+
+ meta = sec_meta_index(pblk, rqd->meta_list, i);
+ lba = le64_to_cpu(meta->lba);
if (lba == ADDR_EMPTY)
continue;
@@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
u64 *lba_list, int nr_lbas)
{
- struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
int i, j;
for (i = 0, j = 0; i < nr_lbas; i++) {
+ struct pblk_sec_meta *meta;
u64 lba = lba_list[i];
u64 meta_lba;
if (lba == ADDR_EMPTY)
continue;
- meta_lba = le64_to_cpu(meta_lba_list[j].lba);
+ meta = sec_meta_index(pblk, rqd->meta_list, j);
+ meta_lba = le64_to_cpu(meta->lba);
if (lba != meta_lba) {
#ifdef CONFIG_NVM_PBLK_DEBUG
@@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
struct bio *new_bio = rqd->bio;
struct bio *bio = pr_ctx->orig_bio;
struct bio_vec src_bv, dst_bv;
- struct pblk_sec_meta *meta_list = rqd->meta_list;
+ struct pblk_sec_meta *meta;
int bio_init_idx = pr_ctx->bio_init_idx;
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
@@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
}
/* Re-use allocated memory for intermediate lbas */
- lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
- lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
+ lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
+ lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
for (i = 0; i < nr_secs; i++) {
- lba_list_media[i] = meta_list[i].lba;
- meta_list[i].lba = lba_list_mem[i];
+ meta = sec_meta_index(pblk, rqd->meta_list, i);
+ lba_list_media[i] = meta->lba;
+ meta->lba = lba_list_mem[i];
}
/* Fill the holes in the original bio */
@@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(&line->ref, pblk_line_put);
- meta_list[hole].lba = lba_list_media[i];
+ meta = sec_meta_index(pblk, rqd->meta_list, hole);
+ meta->lba = lba_list_media[i];
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
unsigned long *read_bitmap,
int nr_holes)
{
- struct pblk_sec_meta *meta_list = rqd->meta_list;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
+ struct pblk_sec_meta *meta;
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
__le64 *lba_list_mem;
@@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
int i;
/* Re-use allocated memory for intermediate lbas */
- lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+ lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
@@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
if (!pr_ctx)
goto fail_free_pages;
- for (i = 0; i < nr_secs; i++)
- lba_list_mem[i] = meta_list[i].lba;
+ for (i = 0; i < nr_secs; i++) {
+ meta = sec_meta_index(pblk, rqd->meta_list, i);
+ lba_list_mem[i] = meta->lba;
+ }
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
sector_t lba, unsigned long *read_bitmap)
{
- struct pblk_sec_meta *meta_list = rqd->meta_list;
+ struct pblk_sec_meta *meta;
struct ppa_addr ppa;
pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
@@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
retry:
if (pblk_ppa_empty(ppa)) {
WARN_ON(test_and_set_bit(0, read_bitmap));
- meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
+
+ meta = sec_meta_index(pblk, rqd->meta_list, 0);
+ meta->lba = cpu_to_le64(ADDR_EMPTY);
+
return;
}
@@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
}
WARN_ON(test_and_set_bit(0, read_bitmap));
- meta_list[0].lba = cpu_to_le64(lba);
+
+ meta = sec_meta_index(pblk, rqd->meta_list, 0);
+ meta->lba = cpu_to_le64(lba);
#ifdef CONFIG_NVM_PBLK_DEBUG
atomic_long_inc(&pblk->cache_reads);
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 8114013c37b8..1ce92562603d 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
- struct pblk_sec_meta *meta_list;
+ struct pblk_sec_meta *meta;
struct pblk_pad_rq *pad_rq;
struct nvm_rq *rqd;
struct bio *bio;
@@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
rqd->end_io = pblk_end_io_recov;
rqd->private = pad_rq;
- meta_list = rqd->meta_list;
-
for (i = 0; i < rqd->nr_ppas; ) {
struct ppa_addr ppa;
int pos;
@@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
pblk_map_invalidate(pblk, dev_ppa);
- lba_list[w_ptr] = meta_list[i].lba = addr_empty;
rqd->ppa_list[i] = dev_ppa;
+
+ meta = sec_meta_index(pblk, rqd->meta_list, i);
+ lba_list[w_ptr] = meta->lba = addr_empty;
}
}
@@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct ppa_addr *ppa_list;
- struct pblk_sec_meta *meta_list;
+ struct pblk_sec_meta *meta_list, *meta;
struct nvm_rq *rqd;
struct bio *bio;
void *data;
@@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
}
for (i = 0; i < rqd->nr_ppas; i++) {
- u64 lba = le64_to_cpu(meta_list[i].lba);
+ u64 lba;
+
+ meta = sec_meta_index(pblk, meta_list, i);
+ lba = le64_to_cpu(meta->lba);
lba_list[paddr++] = cpu_to_le64(lba);
@@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
if (!meta_list)
return -ENOMEM;
- ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
- dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+ if (pblk->dma_shared) {
+ ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
+ dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
+ } else {
+ ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &dma_ppa_list);
+ if (!ppa_list) {
+ ret = -ENOMEM;
+ goto free_meta_list;
+ }
+ }
data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
if (!data) {
ret = -ENOMEM;
- goto free_meta_list;
+ goto free_ppa_list;
}
rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
@@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
out:
mempool_free(rqd, &pblk->r_rq_pool);
kfree(data);
+free_ppa_list:
+ if (!pblk->dma_shared)
+ nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
free_meta_list:
nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
-
return ret;
}
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 22cc9bfbbb10..4526fee206d9 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -86,7 +86,6 @@ enum {
};
struct pblk_sec_meta {
- u64 reserved;
__le64 lba;
};
@@ -103,9 +102,6 @@ enum {
PBLK_RL_LOW = 4
};
-#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
-#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
-
/* write buffer completion context */
struct pblk_c_ctx {
struct list_head list; /* Head for out-of-order completion */
@@ -637,6 +633,10 @@ struct pblk {
int sec_per_write;
+ int dma_meta_size;
+ int dma_ppa_size;
+ bool dma_shared;
+
unsigned char instance_uuid[16];
/* Persistent write amplification counters, 4kb sector I/Os */
@@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
}
+static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
+ struct pblk_sec_meta *meta,
+ int index)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+ struct nvm_geo *geo = &dev->geo;
+
+ return ((void *)meta + index * geo->sos);
+}
+
static inline int pblk_line_vsc(struct pblk_line *line)
{
return le32_to_cpu(*line->vsc);
--
2.7.4
The lightnvm subsystem provides helpers to retrieve chunk metadata,
where the target needs to provide a buffer to store the metadata. An
implicit assumption is that this buffer is contiguous and can be used to
retrieve the data from the device. If the device exposes too many
chunks, then kmalloc might fail, thus failing instance creation.
This patch removes this assumption by implementing an internal buffer in
the lightnvm subsystem to retrieve chunk metadata. Targets can then
use virtual memory allocations. Since this is a target API change, adapt
pblk accordingly.
Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 4 ++--
drivers/lightnvm/pblk-init.c | 2 +-
drivers/nvme/host/lightnvm.c | 23 +++++++++++++++--------
3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index fdcbeb920c9e..a311cc29afd8 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -120,7 +120,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
/*
* Get information for all chunks from the device.
*
- * The caller is responsible for freeing the returned structure
+ * The caller is responsible for freeing (vmalloc) the returned structure
*/
struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
{
@@ -134,7 +134,7 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
ppa.ppa = 0;
len = geo->all_chunks * sizeof(*meta);
- meta = kzalloc(len, GFP_KERNEL);
+ meta = vzalloc(len);
if (!meta)
return ERR_PTR(-ENOMEM);
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index e0db6de137d6..a99854439224 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -983,7 +983,7 @@ static int pblk_lines_init(struct pblk *pblk)
pblk_set_provision(pblk, nr_free_chks);
- kfree(chunk_meta);
+ vfree(chunk_meta);
return 0;
fail_free_lines:
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 2c96e7fcdcac..5bfa354c5dd5 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -579,7 +579,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
struct nvm_geo *geo = &ndev->geo;
struct nvme_ns *ns = ndev->q->queuedata;
struct nvme_ctrl *ctrl = ns->ctrl;
- struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
+ struct nvme_nvm_chk_meta *dev_meta, *dev_meta_off;
struct ppa_addr ppa;
size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
size_t log_pos, offset, len;
@@ -591,6 +591,10 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
*/
max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
+ dev_meta = kmalloc(max_len, GFP_KERNEL);
+ if (!dev_meta)
+ return -ENOMEM;
+
/* Normalize lba address space to obtain log offset */
ppa.ppa = slba;
ppa = dev_to_generic_addr(ndev, ppa);
@@ -604,6 +608,9 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
while (left) {
len = min_t(unsigned int, left, max_len);
+ memset(dev_meta, 0, max_len);
+ dev_meta_off = dev_meta;
+
ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
dev_meta, len, offset);
if (ret) {
@@ -612,15 +619,15 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
}
for (i = 0; i < len; i += sizeof(struct nvme_nvm_chk_meta)) {
- meta->state = dev_meta->state;
- meta->type = dev_meta->type;
- meta->wi = dev_meta->wi;
- meta->slba = le64_to_cpu(dev_meta->slba);
- meta->cnlb = le64_to_cpu(dev_meta->cnlb);
- meta->wp = le64_to_cpu(dev_meta->wp);
+ meta->state = dev_meta_off->state;
+ meta->type = dev_meta_off->type;
+ meta->wi = dev_meta_off->wi;
+ meta->slba = le64_to_cpu(dev_meta_off->slba);
+ meta->cnlb = le64_to_cpu(dev_meta_off->cnlb);
+ meta->wp = le64_to_cpu(dev_meta_off->wp);
meta++;
- dev_meta++;
+ dev_meta_off++;
}
offset += len;
--
2.7.4
On 29.08.2018 12:09, Javier González wrote:
> This patchset adds pblk support for variable metadata sizes on the out
> of bound area.
>
> Igor, Marcin, you sent a patch some time ago implementing a similar
> functionality, but it never got re-posted after review. Can you test
> this and see if it works for your case? I did not implement support for
> outlined metadata for the case where the reported OOB area is 0. It
> should be pretty simple to do that on top of these patches though.
>
Sure, we will test that and let you know. I was busy last time and due
to that I never repost it. I will do this on top of your patches,
hopefully soon.
> Thanks,
> Javier
>
> Javier González (3):
> lightnvm: use internal allocation for chunk log page
> lightnvm: do no update csecs and sos on 1.2
> lightnvm: pblk: support variable OOB size
>
> drivers/lightnvm/pblk-core.c | 60 ++++++++++++++++++++++++++++++----------
> drivers/lightnvm/pblk-init.c | 16 ++++++++++-
> drivers/lightnvm/pblk-map.c | 19 +++++++++----
> drivers/lightnvm/pblk-read.c | 55 ++++++++++++++++++++++--------------
> drivers/lightnvm/pblk-recovery.c | 34 ++++++++++++++++-------
> drivers/lightnvm/pblk.h | 18 +++++++++---
> drivers/nvme/host/lightnvm.c | 26 +++++++++++------
> 7 files changed, 164 insertions(+), 64 deletions(-)
>
> On 29 Aug 2018, at 16.51, Igor Konopko <[email protected]> wrote:
>
>
>
> On 29.08.2018 12:09, Javier González wrote:
>> This patchset adds pblk support for variable metadata sizes on the out
>> of bound area.
>> Igor, Marcin, you sent a patch some time ago implementing a similar
>> functionality, but it never got re-posted after review. Can you test
>> this and see if it works for your case? I did not implement support for
>> outlined metadata for the case where the reported OOB area is 0. It
>> should be pretty simple to do that on top of these patches though.
>
> Sure, we will test that and let you know. I was busy last time and due
> to that I never repost it. I will do this on top of your patches,
> hopefully soon.
>
Thanks Igor, this would be great.
On 08/29/2018 12:09 PM, Javier González wrote:
> In the OCSSD 2.0 spec., the sector and metadata sizes are reported though
> the standard nvme identify command. Thus, the lightnvm subsystem needs
> to update this information on the geometry structure on bootup.
>
> Since 1.2 devices report these values on the OCSSD geometry identify,
> avoid this update is it is unnecessary and can also corrupt the geometry
> if the devices does not report the nvme sizes correctly (which is not
> required by the OCSSD 1.2 spec either) >
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/nvme/host/lightnvm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 5bfa354c5dd5..33ed09f8410e 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -980,6 +980,9 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
> struct nvm_dev *ndev = ns->ndev;
> struct nvm_geo *geo = &ndev->geo;
>
> + if (geo->version == NVM_OCSSD_SPEC_12)
> + return;
The lba format sizes are not explicit in 2.0 either. For a conforming
drive, it should implement the NVM Command Set and expose the LBA format
correctly. Although, I do get your incentive, and if it is okay with
you, I'll reword the commit message to this and apply it?:
"1.2 devices exposes their data and metadata size through the separate
identify command. Make sure that the NVMe LBA format does not override
these values."
> +
> geo->csecs = 1 << ns->lba_shift;
> geo->sos = ns->ms;
> }
>
On 08/29/2018 12:09 PM, Javier González wrote:
> The lightnvm subsystem provides helpers to retrieve chunk metadata,
> where the target needs to provide a buffer to store the metadata. An
> implicit assumption is that this buffer is contiguous and can be used to
> retrieve the data from the device. If the device exposes too many
> chunks, then kmalloc might fail, thus failing instance creation.
>
> This patch removes this assumption by implementing an internal buffer in
> the lightnvm subsystem to retrieve chunk metadata. Targets can then
> use virtual memory allocations. Since this is a target API change, adapt
> pblk accordingly.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 4 ++--
> drivers/lightnvm/pblk-init.c | 2 +-
> drivers/nvme/host/lightnvm.c | 23 +++++++++++++++--------
> 3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index fdcbeb920c9e..a311cc29afd8 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -120,7 +120,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
> /*
> * Get information for all chunks from the device.
> *
> - * The caller is responsible for freeing the returned structure
> + * The caller is responsible for freeing (vmalloc) the returned structure
> */
> struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
> {
> @@ -134,7 +134,7 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
> ppa.ppa = 0;
>
> len = geo->all_chunks * sizeof(*meta);
> - meta = kzalloc(len, GFP_KERNEL);
> + meta = vzalloc(len);
> if (!meta)
> return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index e0db6de137d6..a99854439224 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -983,7 +983,7 @@ static int pblk_lines_init(struct pblk *pblk)
>
> pblk_set_provision(pblk, nr_free_chks);
>
> - kfree(chunk_meta);
> + vfree(chunk_meta);
> return 0;
>
> fail_free_lines:
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 2c96e7fcdcac..5bfa354c5dd5 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -579,7 +579,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> struct nvm_geo *geo = &ndev->geo;
> struct nvme_ns *ns = ndev->q->queuedata;
> struct nvme_ctrl *ctrl = ns->ctrl;
> - struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
> + struct nvme_nvm_chk_meta *dev_meta, *dev_meta_off;
> struct ppa_addr ppa;
> size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
> size_t log_pos, offset, len;
> @@ -591,6 +591,10 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> */
> max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
>
> + dev_meta = kmalloc(max_len, GFP_KERNEL);
> + if (!dev_meta)
> + return -ENOMEM;
> +
> /* Normalize lba address space to obtain log offset */
> ppa.ppa = slba;
> ppa = dev_to_generic_addr(ndev, ppa);
> @@ -604,6 +608,9 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> while (left) {
> len = min_t(unsigned int, left, max_len);
>
> + memset(dev_meta, 0, max_len);
> + dev_meta_off = dev_meta;
> +
> ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
> dev_meta, len, offset);
> if (ret) {
> @@ -612,15 +619,15 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> }
>
> for (i = 0; i < len; i += sizeof(struct nvme_nvm_chk_meta)) {
> - meta->state = dev_meta->state;
> - meta->type = dev_meta->type;
> - meta->wi = dev_meta->wi;
> - meta->slba = le64_to_cpu(dev_meta->slba);
> - meta->cnlb = le64_to_cpu(dev_meta->cnlb);
> - meta->wp = le64_to_cpu(dev_meta->wp);
> + meta->state = dev_meta_off->state;
> + meta->type = dev_meta_off->type;
> + meta->wi = dev_meta_off->wi;
> + meta->slba = le64_to_cpu(dev_meta_off->slba);
> + meta->cnlb = le64_to_cpu(dev_meta_off->cnlb);
> + meta->wp = le64_to_cpu(dev_meta_off->wp);
>
> meta++;
> - dev_meta++;
> + dev_meta_off++;
> }
>
> offset += len;
>
Thanks. Applied for 4.20.
On 09/07/2018 02:53 PM, Hans Holmberg wrote:
> On Wed, Aug 29, 2018 at 12:10 PM Javier González <[email protected]> wrote:
>>
>> The lightnvm subsystem provides helpers to retrieve chunk metadata,
>> where the target needs to provide a buffer to store the metadata. An
>> implicit assumption is that this buffer is contiguous and can be used to
>> retrieve the data from the device. If the device exposes too many
>> chunks, then kmalloc might fail, thus failing instance creation.
>>
>> This patch removes this assumption by implementing an internal buffer in
>> the lightnvm subsystem to retrieve chunk metadata. Targets can then
>> use virtual memory allocations. Since this is a target API change, adapt
>> pblk accordingly.
>>
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-core.c | 4 ++--
>> drivers/lightnvm/pblk-init.c | 2 +-
>> drivers/nvme/host/lightnvm.c | 23 +++++++++++++++--------
>> 3 files changed, 18 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index fdcbeb920c9e..a311cc29afd8 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -120,7 +120,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
>> /*
>> * Get information for all chunks from the device.
>> *
>> - * The caller is responsible for freeing the returned structure
>> + * The caller is responsible for freeing (vmalloc) the returned structure
>> */
>> struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
>> {
>> @@ -134,7 +134,7 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
>> ppa.ppa = 0;
>>
>> len = geo->all_chunks * sizeof(*meta);
>> - meta = kzalloc(len, GFP_KERNEL);
>> + meta = vzalloc(len);
>> if (!meta)
>> return ERR_PTR(-ENOMEM);
>>
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index e0db6de137d6..a99854439224 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -983,7 +983,7 @@ static int pblk_lines_init(struct pblk *pblk)
>>
>> pblk_set_provision(pblk, nr_free_chks);
>>
>> - kfree(chunk_meta);
>> + vfree(chunk_meta);
>> return 0;
>>
>> fail_free_lines:
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 2c96e7fcdcac..5bfa354c5dd5 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -579,7 +579,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> struct nvm_geo *geo = &ndev->geo;
>> struct nvme_ns *ns = ndev->q->queuedata;
>> struct nvme_ctrl *ctrl = ns->ctrl;
>> - struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
>> + struct nvme_nvm_chk_meta *dev_meta, *dev_meta_off;
>> struct ppa_addr ppa;
>> size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
>> size_t log_pos, offset, len;
>> @@ -591,6 +591,10 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> */
>> max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
>>
>> + dev_meta = kmalloc(max_len, GFP_KERNEL);
>> + if (!dev_meta)
>> + return -ENOMEM;
>
> We should free the memory after using it. Pardon the late review, I
> just now discovered the memory leak during testing.
Thanks Hans. No worries, I saw it just when I had ack'ed the patch.
Would you like me to add your Reviewed-by?
>
>> +
>> /* Normalize lba address space to obtain log offset */
>> ppa.ppa = slba;
>> ppa = dev_to_generic_addr(ndev, ppa);
>> @@ -604,6 +608,9 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> while (left) {
>> len = min_t(unsigned int, left, max_len);
>>
>> + memset(dev_meta, 0, max_len);
>> + dev_meta_off = dev_meta;
>> +
>> ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
>> dev_meta, len, offset);
>> if (ret) {
>> @@ -612,15 +619,15 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
>> }
>>
>> for (i = 0; i < len; i += sizeof(struct nvme_nvm_chk_meta)) {
>> - meta->state = dev_meta->state;
>> - meta->type = dev_meta->type;
>> - meta->wi = dev_meta->wi;
>> - meta->slba = le64_to_cpu(dev_meta->slba);
>> - meta->cnlb = le64_to_cpu(dev_meta->cnlb);
>> - meta->wp = le64_to_cpu(dev_meta->wp);
>> + meta->state = dev_meta_off->state;
>> + meta->type = dev_meta_off->type;
>> + meta->wi = dev_meta_off->wi;
>> + meta->slba = le64_to_cpu(dev_meta_off->slba);
>> + meta->cnlb = le64_to_cpu(dev_meta_off->cnlb);
>> + meta->wp = le64_to_cpu(dev_meta_off->wp);
>>
>> meta++;
>> - dev_meta++;
>> + dev_meta_off++;
>> }
>>
>> offset += len;
>> --
>> 2.7.4
>>
On Wed, Aug 29, 2018 at 12:10 PM Javier González <[email protected]> wrote:
>
> The lightnvm subsystem provides helpers to retrieve chunk metadata,
> where the target needs to provide a buffer to store the metadata. An
> implicit assumption is that this buffer is contiguous and can be used to
> retrieve the data from the device. If the device exposes too many
> chunks, then kmalloc might fail, thus failing instance creation.
>
> This patch removes this assumption by implementing an internal buffer in
> the lightnvm subsystem to retrieve chunk metadata. Targets can then
> use virtual memory allocations. Since this is a target API change, adapt
> pblk accordingly.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 4 ++--
> drivers/lightnvm/pblk-init.c | 2 +-
> drivers/nvme/host/lightnvm.c | 23 +++++++++++++++--------
> 3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index fdcbeb920c9e..a311cc29afd8 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -120,7 +120,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
> /*
> * Get information for all chunks from the device.
> *
> - * The caller is responsible for freeing the returned structure
> + * The caller is responsible for freeing (vmalloc) the returned structure
> */
> struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
> {
> @@ -134,7 +134,7 @@ struct nvm_chk_meta *pblk_get_chunk_meta(struct pblk *pblk)
> ppa.ppa = 0;
>
> len = geo->all_chunks * sizeof(*meta);
> - meta = kzalloc(len, GFP_KERNEL);
> + meta = vzalloc(len);
> if (!meta)
> return ERR_PTR(-ENOMEM);
>
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index e0db6de137d6..a99854439224 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -983,7 +983,7 @@ static int pblk_lines_init(struct pblk *pblk)
>
> pblk_set_provision(pblk, nr_free_chks);
>
> - kfree(chunk_meta);
> + vfree(chunk_meta);
> return 0;
>
> fail_free_lines:
> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> index 2c96e7fcdcac..5bfa354c5dd5 100644
> --- a/drivers/nvme/host/lightnvm.c
> +++ b/drivers/nvme/host/lightnvm.c
> @@ -579,7 +579,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> struct nvm_geo *geo = &ndev->geo;
> struct nvme_ns *ns = ndev->q->queuedata;
> struct nvme_ctrl *ctrl = ns->ctrl;
> - struct nvme_nvm_chk_meta *dev_meta = (struct nvme_nvm_chk_meta *)meta;
> + struct nvme_nvm_chk_meta *dev_meta, *dev_meta_off;
> struct ppa_addr ppa;
> size_t left = nchks * sizeof(struct nvme_nvm_chk_meta);
> size_t log_pos, offset, len;
> @@ -591,6 +591,10 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> */
> max_len = min_t(unsigned int, ctrl->max_hw_sectors << 9, 256 * 1024);
>
> + dev_meta = kmalloc(max_len, GFP_KERNEL);
> + if (!dev_meta)
> + return -ENOMEM;
We should free the memory after using it. Pardon the late review, I
just now discovered the memory leak during testing.
> +
> /* Normalize lba address space to obtain log offset */
> ppa.ppa = slba;
> ppa = dev_to_generic_addr(ndev, ppa);
> @@ -604,6 +608,9 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> while (left) {
> len = min_t(unsigned int, left, max_len);
>
> + memset(dev_meta, 0, max_len);
> + dev_meta_off = dev_meta;
> +
> ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
> dev_meta, len, offset);
> if (ret) {
> @@ -612,15 +619,15 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
> }
>
> for (i = 0; i < len; i += sizeof(struct nvme_nvm_chk_meta)) {
> - meta->state = dev_meta->state;
> - meta->type = dev_meta->type;
> - meta->wi = dev_meta->wi;
> - meta->slba = le64_to_cpu(dev_meta->slba);
> - meta->cnlb = le64_to_cpu(dev_meta->cnlb);
> - meta->wp = le64_to_cpu(dev_meta->wp);
> + meta->state = dev_meta_off->state;
> + meta->type = dev_meta_off->type;
> + meta->wi = dev_meta_off->wi;
> + meta->slba = le64_to_cpu(dev_meta_off->slba);
> + meta->cnlb = le64_to_cpu(dev_meta_off->cnlb);
> + meta->wp = le64_to_cpu(dev_meta_off->wp);
>
> meta++;
> - dev_meta++;
> + dev_meta_off++;
> }
>
> offset += len;
> --
> 2.7.4
>
On 08/29/2018 12:09 PM, Javier González wrote:
> pblk uses 8 bytes in the metadata region area exposed by the device
> through the out of band area to store the lba mapped to the given
> physical sector. This is used for recovery purposes. Given that the
> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
> structure for this purpose.
>
> This patch relaxes the 16 bytes assumption and uses the metadata size
> reported by the device to layout metadata appropriately for the vector
> commands. This adds support for arbitrary metadata sizes, as long as
> these are larger than 8 bytes. Note that this patch does not address the
> case in which the device does not expose an out of band area and that
> pblk creation will fail in this case.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
> drivers/lightnvm/pblk-init.c | 14 ++++++++++
> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
> drivers/lightnvm/pblk.h | 18 ++++++++++---
> 6 files changed, 143 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index a311cc29afd8..d52e0047ae9d 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
> if (!is_vector)
> return 0;
>
> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
> + if (pblk->dma_shared) {
> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
> +
> + return 0;
> + }
> +
> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
> + &rqd->dma_ppa_list);
> + if (!rqd->ppa_list) {
> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
> + rqd->dma_meta_list);
> + return -ENOMEM;
> + }
>
> return 0;
> }
> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>
> if (rqd->meta_list)
> nvm_dev_dma_free(dev->parent, rqd->meta_list,
> - rqd->dma_meta_list);
> + rqd->dma_meta_list);
> +
> + if (!pblk->dma_shared && rqd->ppa_list)
> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
> + rqd->dma_ppa_list);
> }
>
> /* Caller must guarantee that the request is a valid type */
> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> rqd.is_seq = 1;
>
> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> - struct pblk_sec_meta *meta_list = rqd.meta_list;
> + struct pblk_sec_meta *meta;
>
> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> - meta_list[i].lba = lba_list[paddr] = addr_empty;
> +
> + meta = sec_meta_index(pblk, rqd.meta_list, i);
> + meta->lba = lba_list[paddr] = addr_empty;
> }
>
> ret = pblk_submit_io_sync_sem(pblk, &rqd);
> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> if (!meta_list)
> return -ENOMEM;
>
> - ppa_list = meta_list + pblk_dma_meta_size;
> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> + if (pblk->dma_shared) {
> + ppa_list = meta_list + pblk->dma_meta_size;
> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
> + } else {
> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> + &dma_ppa_list);
> + if (!ppa_list) {
> + ret = -ENOMEM;
> + goto free_meta_list;
> + }
> + }
>
> next_rq:
> memset(&rqd, 0, sizeof(struct nvm_rq));
> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> l_mg->emeta_alloc_type, GFP_KERNEL);
> if (IS_ERR(bio)) {
> ret = PTR_ERR(bio);
> - goto free_rqd_dma;
> + goto free_ppa_list;
> }
>
> bio->bi_iter.bi_sector = 0; /* internal bio */
> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> if (pblk_boundary_paddr_checks(pblk, paddr)) {
> bio_put(bio);
> ret = -EINTR;
> - goto free_rqd_dma;
> + goto free_ppa_list;
> }
>
> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
> bio_put(bio);
> ret = -EINTR;
> - goto free_rqd_dma;
> + goto free_ppa_list;
> }
>
> for (j = 0; j < min; j++, i++, paddr++)
> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> if (ret) {
> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
> bio_put(bio);
> - goto free_rqd_dma;
> + goto free_ppa_list;
> }
>
> atomic_dec(&pblk->inflight_io);
> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> if (left_ppas)
> goto next_rq;
>
> -free_rqd_dma:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +free_ppa_list:
> + if (!pblk->dma_shared)
> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
> +free_meta_list:
> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> return ret;
> }
>
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index a99854439224..57972156c318 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
> struct nvm_geo *geo = &dev->geo;
> int ret, max_write_ppas;
>
> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
> + return -EINTR;
> + }
> +
> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
> +
> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
> + pblk->dma_shared = false;
> + else
> + pblk->dma_shared = true;
> +
> atomic64_set(&pblk->user_wa, 0);
> atomic64_set(&pblk->pad_wa, 0);
> atomic64_set(&pblk->gc_wa, 0);
> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
> index dc0efb852475..55fca16d18e4 100644
> --- a/drivers/lightnvm/pblk-map.c
> +++ b/drivers/lightnvm/pblk-map.c
> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> unsigned int valid_secs)
> {
> struct pblk_line *line = pblk_line_get_data(pblk);
> + struct pblk_sec_meta *meta;
> struct pblk_emeta *emeta;
> struct pblk_w_ctx *w_ctx;
> __le64 *lba_list;
> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> /* ppa to be sent to the device */
> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>
> + meta = sec_meta_index(pblk, meta_list, i);
> +
> /* Write context for target bio completion on write buffer. Note
> * that the write buffer is protected by the sync backpointer,
> * and a single writer thread have access to each specific entry
> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
> kref_get(&line->ref);
> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
> w_ctx->ppa = ppa_list[i];
> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
> + meta->lba = cpu_to_le64(w_ctx->lba);
> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
> if (lba_list[paddr] != addr_empty)
> line->nr_valid_lbas++;
> else
> atomic64_inc(&pblk->pad_wa);
> } else {
> - lba_list[paddr] = meta_list[i].lba = addr_empty;
> + lba_list[paddr] = meta->lba = addr_empty;
> __pblk_map_invalidate(pblk, line, paddr);
> }
> }
> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
> unsigned long *lun_bitmap, unsigned int valid_secs,
> unsigned int off)
> {
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> + struct pblk_sec_meta *meta_list;
> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
> unsigned int map_secs;
> int min = pblk->min_write_pgs;
> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>
> for (i = off; i < rqd->nr_ppas; i += min) {
> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
> +
> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
> - lun_bitmap, &meta_list[i], map_secs)) {
> + lun_bitmap, meta_list, map_secs)) {
> bio_put(rqd->bio);
> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> pblk_pipeline_stop(pblk);
> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> struct pblk_line_meta *lm = &pblk->lm;
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
> + struct pblk_sec_meta *meta_list;
> struct pblk_line *e_line, *d_line;
> unsigned int map_secs;
> int min = pblk->min_write_pgs;
> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>
> for (i = 0; i < rqd->nr_ppas; i += min) {
> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
> +
> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
> - lun_bitmap, &meta_list[i], map_secs)) {
> + lun_bitmap, meta_list, map_secs)) {
> bio_put(rqd->bio);
> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> pblk_pipeline_stop(pblk);
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 57d3155ef9a5..12b690e2abd9 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> struct bio *bio, sector_t blba,
> unsigned long *read_bitmap)
> {
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> struct ppa_addr ppas[NVM_MAX_VLBA];
> int nr_secs = rqd->nr_ppas;
> bool advanced_bio = false;
> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>
> for (i = 0; i < nr_secs; i++) {
> + struct pblk_sec_meta *meta;
> struct ppa_addr p = ppas[i];
> sector_t lba = blba + i;
>
> + meta = sec_meta_index(pblk, rqd->meta_list, i);
> retry:
> if (pblk_ppa_empty(p)) {
> WARN_ON(test_and_set_bit(i, read_bitmap));
> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> +
> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>
> if (unlikely(!advanced_bio)) {
> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> goto retry;
> }
> WARN_ON(test_and_set_bit(i, read_bitmap));
> - meta_list[i].lba = cpu_to_le64(lba);
> + meta->lba = cpu_to_le64(lba);
> advanced_bio = true;
> #ifdef CONFIG_NVM_PBLK_DEBUG
> atomic_long_inc(&pblk->cache_reads);
> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
> sector_t blba)
> {
> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> int nr_lbas = rqd->nr_ppas;
> int i;
>
> for (i = 0; i < nr_lbas; i++) {
> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
> + struct pblk_sec_meta *meta;
> + u64 lba;
> +
> + meta = sec_meta_index(pblk, rqd->meta_list, i);
> + lba = le64_to_cpu(meta->lba);
>
> if (lba == ADDR_EMPTY)
> continue;
> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
> u64 *lba_list, int nr_lbas)
> {
> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
> int i, j;
>
> for (i = 0, j = 0; i < nr_lbas; i++) {
> + struct pblk_sec_meta *meta;
> u64 lba = lba_list[i];
> u64 meta_lba;
>
> if (lba == ADDR_EMPTY)
> continue;
>
> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
> + meta = sec_meta_index(pblk, rqd->meta_list, j);
> + meta_lba = le64_to_cpu(meta->lba);
>
> if (lba != meta_lba) {
> #ifdef CONFIG_NVM_PBLK_DEBUG
> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> struct bio *new_bio = rqd->bio;
> struct bio *bio = pr_ctx->orig_bio;
> struct bio_vec src_bv, dst_bv;
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> + struct pblk_sec_meta *meta;
> int bio_init_idx = pr_ctx->bio_init_idx;
> unsigned long *read_bitmap = pr_ctx->bitmap;
> int nr_secs = pr_ctx->orig_nr_secs;
> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> }
>
> /* Re-use allocated memory for intermediate lbas */
> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>
> for (i = 0; i < nr_secs; i++) {
> - lba_list_media[i] = meta_list[i].lba;
> - meta_list[i].lba = lba_list_mem[i];
> + meta = sec_meta_index(pblk, rqd->meta_list, i);
> + lba_list_media[i] = meta->lba;
> + meta->lba = lba_list_mem[i];
> }
>
> /* Fill the holes in the original bio */
> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
> kref_put(&line->ref, pblk_line_put);
>
> - meta_list[hole].lba = lba_list_media[i];
> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
> + meta->lba = lba_list_media[i];
>
> src_bv = new_bio->bi_io_vec[i++];
> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> unsigned long *read_bitmap,
> int nr_holes)
> {
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
> + struct pblk_sec_meta *meta;
> struct pblk_pr_ctx *pr_ctx;
> struct bio *new_bio, *bio = r_ctx->private;
> __le64 *lba_list_mem;
> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> int i;
>
> /* Re-use allocated memory for intermediate lbas */
> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>
> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>
> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
> if (!pr_ctx)
> goto fail_free_pages;
>
> - for (i = 0; i < nr_secs; i++)
> - lba_list_mem[i] = meta_list[i].lba;
> + for (i = 0; i < nr_secs; i++) {
> + meta = sec_meta_index(pblk, rqd->meta_list, i);
> + lba_list_mem[i] = meta->lba;
> + }
>
> new_bio->bi_iter.bi_sector = 0; /* internal bio */
> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> sector_t lba, unsigned long *read_bitmap)
> {
> - struct pblk_sec_meta *meta_list = rqd->meta_list;
> + struct pblk_sec_meta *meta;
> struct ppa_addr ppa;
>
> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> retry:
> if (pblk_ppa_empty(ppa)) {
> WARN_ON(test_and_set_bit(0, read_bitmap));
> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
> +
> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
> + meta->lba = cpu_to_le64(ADDR_EMPTY);
> +
> return;
> }
>
> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
> }
>
> WARN_ON(test_and_set_bit(0, read_bitmap));
> - meta_list[0].lba = cpu_to_le64(lba);
> +
> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
> + meta->lba = cpu_to_le64(lba);
>
> #ifdef CONFIG_NVM_PBLK_DEBUG
> atomic_long_inc(&pblk->cache_reads);
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 8114013c37b8..1ce92562603d 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> - struct pblk_sec_meta *meta_list;
> + struct pblk_sec_meta *meta;
> struct pblk_pad_rq *pad_rq;
> struct nvm_rq *rqd;
> struct bio *bio;
> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> rqd->end_io = pblk_end_io_recov;
> rqd->private = pad_rq;
>
> - meta_list = rqd->meta_list;
> -
> for (i = 0; i < rqd->nr_ppas; ) {
> struct ppa_addr ppa;
> int pos;
> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>
> pblk_map_invalidate(pblk, dev_ppa);
> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
> rqd->ppa_list[i] = dev_ppa;
> +
> + meta = sec_meta_index(pblk, rqd->meta_list, i);
> + lba_list[w_ptr] = meta->lba = addr_empty;
> }
> }
>
> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> struct ppa_addr *ppa_list;
> - struct pblk_sec_meta *meta_list;
> + struct pblk_sec_meta *meta_list, *meta;
> struct nvm_rq *rqd;
> struct bio *bio;
> void *data;
> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
> }
>
> for (i = 0; i < rqd->nr_ppas; i++) {
> - u64 lba = le64_to_cpu(meta_list[i].lba);
> + u64 lba;
> +
> + meta = sec_meta_index(pblk, meta_list, i);
> + lba = le64_to_cpu(meta->lba);
>
> lba_list[paddr++] = cpu_to_le64(lba);
>
> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
> if (!meta_list)
> return -ENOMEM;
>
> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> + if (pblk->dma_shared) {
> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
> + } else {
> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> + &dma_ppa_list);
> + if (!ppa_list) {
> + ret = -ENOMEM;
> + goto free_meta_list;
> + }
> + }
>
> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
> if (!data) {
> ret = -ENOMEM;
> - goto free_meta_list;
> + goto free_ppa_list;
> }
>
> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
> out:
> mempool_free(rqd, &pblk->r_rq_pool);
> kfree(data);
> +free_ppa_list:
> + if (!pblk->dma_shared)
> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
> free_meta_list:
> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> -
> return ret;
> }
>
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 22cc9bfbbb10..4526fee206d9 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -86,7 +86,6 @@ enum {
> };
>
> struct pblk_sec_meta {
> - u64 reserved;
> __le64 lba;
> };
>
> @@ -103,9 +102,6 @@ enum {
> PBLK_RL_LOW = 4
> };
>
> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
> -
> /* write buffer completion context */
> struct pblk_c_ctx {
> struct list_head list; /* Head for out-of-order completion */
> @@ -637,6 +633,10 @@ struct pblk {
>
> int sec_per_write;
>
> + int dma_meta_size;
> + int dma_ppa_size;
> + bool dma_shared;
> +
> unsigned char instance_uuid[16];
>
> /* Persistent write amplification counters, 4kb sector I/Os */
> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
> }
>
> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
> + struct pblk_sec_meta *meta,
> + int index)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> + struct nvm_geo *geo = &dev->geo;
> +
> + return ((void *)meta + index * geo->sos);
> +}
> +
> static inline int pblk_line_vsc(struct pblk_line *line)
> {
> return le32_to_cpu(*line->vsc);
>
It will be helpful to split this patch into two:
- One that does the 8b conversion
- One that makes the change to merge metadata and ppa list data buffers
- How about making it a simplification to only allow up to 32b
metadata buffers, then one doesn't have to think about it crossing
multiple pages.
- You can also make a structure for it, use the first 3.5KB for meta,
and the next 512B for PPAs.
> On 7 Sep 2018, at 12.28, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 12:09 PM, Javier González wrote:
>> In the OCSSD 2.0 spec., the sector and metadata sizes are reported though
>> the standard nvme identify command. Thus, the lightnvm subsystem needs
>> to update this information on the geometry structure on bootup.
>> Since 1.2 devices report these values on the OCSSD geometry identify,
>> avoid this update is it is unnecessary and can also corrupt the geometry
>> if the devices does not report the nvme sizes correctly (which is not
>> required by the OCSSD 1.2 spec either) >
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/nvme/host/lightnvm.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>> index 5bfa354c5dd5..33ed09f8410e 100644
>> --- a/drivers/nvme/host/lightnvm.c
>> +++ b/drivers/nvme/host/lightnvm.c
>> @@ -980,6 +980,9 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>> struct nvm_dev *ndev = ns->ndev;
>> struct nvm_geo *geo = &ndev->geo;
>> + if (geo->version == NVM_OCSSD_SPEC_12)
>> + return;
>
> The lba format sizes are not explicit in 2.0 either. For a conforming drive, it should implement the NVM Command Set and expose the LBA format correctly. Although, I do get your incentive, and if it is okay with you, I'll reword the commit message to this and apply it?:
>
> "1.2 devices exposes their data and metadata size through the separate
> identify command. Make sure that the NVMe LBA format does not override
> these values."
>
Sounds good. I'll resend either way as I rebased on top of for-4.20/core
- you can change the commit as you think it suits.
Javier
> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 12:09 PM, Javier González wrote:
>> pblk uses 8 bytes in the metadata region area exposed by the device
>> through the out of band area to store the lba mapped to the given
>> physical sector. This is used for recovery purposes. Given that the
>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>> structure for this purpose.
>> This patch relaxes the 16 bytes assumption and uses the metadata size
>> reported by the device to layout metadata appropriately for the vector
>> commands. This adds support for arbitrary metadata sizes, as long as
>> these are larger than 8 bytes. Note that this patch does not address the
>> case in which the device does not expose an out of band area and that
>> pblk creation will fail in this case.
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>> 6 files changed, 143 insertions(+), 53 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index a311cc29afd8..d52e0047ae9d 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>> if (!is_vector)
>> return 0;
>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>> + if (pblk->dma_shared) {
>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>> +
>> + return 0;
>> + }
>> +
>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>> + &rqd->dma_ppa_list);
>> + if (!rqd->ppa_list) {
>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>> + rqd->dma_meta_list);
>> + return -ENOMEM;
>> + }
>> return 0;
>> }
>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>> if (rqd->meta_list)
>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>> - rqd->dma_meta_list);
>> + rqd->dma_meta_list);
>> +
>> + if (!pblk->dma_shared && rqd->ppa_list)
>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>> + rqd->dma_ppa_list);
>> }
>> /* Caller must guarantee that the request is a valid type */
>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>> rqd.is_seq = 1;
>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>> + struct pblk_sec_meta *meta;
>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>> +
>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>> + meta->lba = lba_list[paddr] = addr_empty;
>> }
>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> if (!meta_list)
>> return -ENOMEM;
>> - ppa_list = meta_list + pblk_dma_meta_size;
>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>> + if (pblk->dma_shared) {
>> + ppa_list = meta_list + pblk->dma_meta_size;
>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>> + } else {
>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>> + &dma_ppa_list);
>> + if (!ppa_list) {
>> + ret = -ENOMEM;
>> + goto free_meta_list;
>> + }
>> + }
>> next_rq:
>> memset(&rqd, 0, sizeof(struct nvm_rq));
>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> l_mg->emeta_alloc_type, GFP_KERNEL);
>> if (IS_ERR(bio)) {
>> ret = PTR_ERR(bio);
>> - goto free_rqd_dma;
>> + goto free_ppa_list;
>> }
>> bio->bi_iter.bi_sector = 0; /* internal bio */
>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>> bio_put(bio);
>> ret = -EINTR;
>> - goto free_rqd_dma;
>> + goto free_ppa_list;
>> }
>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>> bio_put(bio);
>> ret = -EINTR;
>> - goto free_rqd_dma;
>> + goto free_ppa_list;
>> }
>> for (j = 0; j < min; j++, i++, paddr++)
>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> if (ret) {
>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>> bio_put(bio);
>> - goto free_rqd_dma;
>> + goto free_ppa_list;
>> }
>> atomic_dec(&pblk->inflight_io);
>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>> if (left_ppas)
>> goto next_rq;
>> -free_rqd_dma:
>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>> +free_ppa_list:
>> + if (!pblk->dma_shared)
>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>> +free_meta_list:
>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>> return ret;
>> }
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index a99854439224..57972156c318 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>> struct nvm_geo *geo = &dev->geo;
>> int ret, max_write_ppas;
>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>> + return -EINTR;
>> + }
>> +
>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>> +
>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>> + pblk->dma_shared = false;
>> + else
>> + pblk->dma_shared = true;
>> +
>> atomic64_set(&pblk->user_wa, 0);
>> atomic64_set(&pblk->pad_wa, 0);
>> atomic64_set(&pblk->gc_wa, 0);
>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>> index dc0efb852475..55fca16d18e4 100644
>> --- a/drivers/lightnvm/pblk-map.c
>> +++ b/drivers/lightnvm/pblk-map.c
>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> unsigned int valid_secs)
>> {
>> struct pblk_line *line = pblk_line_get_data(pblk);
>> + struct pblk_sec_meta *meta;
>> struct pblk_emeta *emeta;
>> struct pblk_w_ctx *w_ctx;
>> __le64 *lba_list;
>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> /* ppa to be sent to the device */
>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>> + meta = sec_meta_index(pblk, meta_list, i);
>> +
>> /* Write context for target bio completion on write buffer. Note
>> * that the write buffer is protected by the sync backpointer,
>> * and a single writer thread have access to each specific entry
>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>> kref_get(&line->ref);
>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>> w_ctx->ppa = ppa_list[i];
>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>> + meta->lba = cpu_to_le64(w_ctx->lba);
>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>> if (lba_list[paddr] != addr_empty)
>> line->nr_valid_lbas++;
>> else
>> atomic64_inc(&pblk->pad_wa);
>> } else {
>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>> + lba_list[paddr] = meta->lba = addr_empty;
>> __pblk_map_invalidate(pblk, line, paddr);
>> }
>> }
>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>> unsigned long *lun_bitmap, unsigned int valid_secs,
>> unsigned int off)
>> {
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> + struct pblk_sec_meta *meta_list;
>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>> unsigned int map_secs;
>> int min = pblk->min_write_pgs;
>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>> for (i = off; i < rqd->nr_ppas; i += min) {
>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>> +
>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>> - lun_bitmap, &meta_list[i], map_secs)) {
>> + lun_bitmap, meta_list, map_secs)) {
>> bio_put(rqd->bio);
>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>> pblk_pipeline_stop(pblk);
>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> struct nvm_tgt_dev *dev = pblk->dev;
>> struct nvm_geo *geo = &dev->geo;
>> struct pblk_line_meta *lm = &pblk->lm;
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>> + struct pblk_sec_meta *meta_list;
>> struct pblk_line *e_line, *d_line;
>> unsigned int map_secs;
>> int min = pblk->min_write_pgs;
>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> for (i = 0; i < rqd->nr_ppas; i += min) {
>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>> +
>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>> - lun_bitmap, &meta_list[i], map_secs)) {
>> + lun_bitmap, meta_list, map_secs)) {
>> bio_put(rqd->bio);
>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>> pblk_pipeline_stop(pblk);
>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>> index 57d3155ef9a5..12b690e2abd9 100644
>> --- a/drivers/lightnvm/pblk-read.c
>> +++ b/drivers/lightnvm/pblk-read.c
>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> struct bio *bio, sector_t blba,
>> unsigned long *read_bitmap)
>> {
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> struct ppa_addr ppas[NVM_MAX_VLBA];
>> int nr_secs = rqd->nr_ppas;
>> bool advanced_bio = false;
>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>> for (i = 0; i < nr_secs; i++) {
>> + struct pblk_sec_meta *meta;
>> struct ppa_addr p = ppas[i];
>> sector_t lba = blba + i;
>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>> retry:
>> if (pblk_ppa_empty(p)) {
>> WARN_ON(test_and_set_bit(i, read_bitmap));
>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>> +
>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>> if (unlikely(!advanced_bio)) {
>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> goto retry;
>> }
>> WARN_ON(test_and_set_bit(i, read_bitmap));
>> - meta_list[i].lba = cpu_to_le64(lba);
>> + meta->lba = cpu_to_le64(lba);
>> advanced_bio = true;
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> atomic_long_inc(&pblk->cache_reads);
>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>> sector_t blba)
>> {
>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>> int nr_lbas = rqd->nr_ppas;
>> int i;
>> for (i = 0; i < nr_lbas; i++) {
>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>> + struct pblk_sec_meta *meta;
>> + u64 lba;
>> +
>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>> + lba = le64_to_cpu(meta->lba);
>> if (lba == ADDR_EMPTY)
>> continue;
>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>> u64 *lba_list, int nr_lbas)
>> {
>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>> int i, j;
>> for (i = 0, j = 0; i < nr_lbas; i++) {
>> + struct pblk_sec_meta *meta;
>> u64 lba = lba_list[i];
>> u64 meta_lba;
>> if (lba == ADDR_EMPTY)
>> continue;
>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>> + meta_lba = le64_to_cpu(meta->lba);
>> if (lba != meta_lba) {
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>> struct bio *new_bio = rqd->bio;
>> struct bio *bio = pr_ctx->orig_bio;
>> struct bio_vec src_bv, dst_bv;
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> + struct pblk_sec_meta *meta;
>> int bio_init_idx = pr_ctx->bio_init_idx;
>> unsigned long *read_bitmap = pr_ctx->bitmap;
>> int nr_secs = pr_ctx->orig_nr_secs;
>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>> }
>> /* Re-use allocated memory for intermediate lbas */
>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>> for (i = 0; i < nr_secs; i++) {
>> - lba_list_media[i] = meta_list[i].lba;
>> - meta_list[i].lba = lba_list_mem[i];
>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>> + lba_list_media[i] = meta->lba;
>> + meta->lba = lba_list_mem[i];
>> }
>> /* Fill the holes in the original bio */
>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>> kref_put(&line->ref, pblk_line_put);
>> - meta_list[hole].lba = lba_list_media[i];
>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>> + meta->lba = lba_list_media[i];
>> src_bv = new_bio->bi_io_vec[i++];
>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> unsigned long *read_bitmap,
>> int nr_holes)
>> {
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>> + struct pblk_sec_meta *meta;
>> struct pblk_pr_ctx *pr_ctx;
>> struct bio *new_bio, *bio = r_ctx->private;
>> __le64 *lba_list_mem;
>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> int i;
>> /* Re-use allocated memory for intermediate lbas */
>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>> if (!pr_ctx)
>> goto fail_free_pages;
>> - for (i = 0; i < nr_secs; i++)
>> - lba_list_mem[i] = meta_list[i].lba;
>> + for (i = 0; i < nr_secs; i++) {
>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>> + lba_list_mem[i] = meta->lba;
>> + }
>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> sector_t lba, unsigned long *read_bitmap)
>> {
>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>> + struct pblk_sec_meta *meta;
>> struct ppa_addr ppa;
>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> retry:
>> if (pblk_ppa_empty(ppa)) {
>> WARN_ON(test_and_set_bit(0, read_bitmap));
>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>> +
>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>> +
>> return;
>> }
>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>> }
>> WARN_ON(test_and_set_bit(0, read_bitmap));
>> - meta_list[0].lba = cpu_to_le64(lba);
>> +
>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>> + meta->lba = cpu_to_le64(lba);
>> #ifdef CONFIG_NVM_PBLK_DEBUG
>> atomic_long_inc(&pblk->cache_reads);
>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>> index 8114013c37b8..1ce92562603d 100644
>> --- a/drivers/lightnvm/pblk-recovery.c
>> +++ b/drivers/lightnvm/pblk-recovery.c
>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>> {
>> struct nvm_tgt_dev *dev = pblk->dev;
>> struct nvm_geo *geo = &dev->geo;
>> - struct pblk_sec_meta *meta_list;
>> + struct pblk_sec_meta *meta;
>> struct pblk_pad_rq *pad_rq;
>> struct nvm_rq *rqd;
>> struct bio *bio;
>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>> rqd->end_io = pblk_end_io_recov;
>> rqd->private = pad_rq;
>> - meta_list = rqd->meta_list;
>> -
>> for (i = 0; i < rqd->nr_ppas; ) {
>> struct ppa_addr ppa;
>> int pos;
>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>> pblk_map_invalidate(pblk, dev_ppa);
>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>> rqd->ppa_list[i] = dev_ppa;
>> +
>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>> + lba_list[w_ptr] = meta->lba = addr_empty;
>> }
>> }
>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>> struct nvm_tgt_dev *dev = pblk->dev;
>> struct nvm_geo *geo = &dev->geo;
>> struct ppa_addr *ppa_list;
>> - struct pblk_sec_meta *meta_list;
>> + struct pblk_sec_meta *meta_list, *meta;
>> struct nvm_rq *rqd;
>> struct bio *bio;
>> void *data;
>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>> }
>> for (i = 0; i < rqd->nr_ppas; i++) {
>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>> + u64 lba;
>> +
>> + meta = sec_meta_index(pblk, meta_list, i);
>> + lba = le64_to_cpu(meta->lba);
>> lba_list[paddr++] = cpu_to_le64(lba);
>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>> if (!meta_list)
>> return -ENOMEM;
>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>> + if (pblk->dma_shared) {
>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>> + } else {
>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>> + &dma_ppa_list);
>> + if (!ppa_list) {
>> + ret = -ENOMEM;
>> + goto free_meta_list;
>> + }
>> + }
>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>> if (!data) {
>> ret = -ENOMEM;
>> - goto free_meta_list;
>> + goto free_ppa_list;
>> }
>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>> out:
>> mempool_free(rqd, &pblk->r_rq_pool);
>> kfree(data);
>> +free_ppa_list:
>> + if (!pblk->dma_shared)
>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>> free_meta_list:
>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>> -
>> return ret;
>> }
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 22cc9bfbbb10..4526fee206d9 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -86,7 +86,6 @@ enum {
>> };
>> struct pblk_sec_meta {
>> - u64 reserved;
>> __le64 lba;
>> };
>> @@ -103,9 +102,6 @@ enum {
>> PBLK_RL_LOW = 4
>> };
>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>> -
>> /* write buffer completion context */
>> struct pblk_c_ctx {
>> struct list_head list; /* Head for out-of-order completion */
>> @@ -637,6 +633,10 @@ struct pblk {
>> int sec_per_write;
>> + int dma_meta_size;
>> + int dma_ppa_size;
>> + bool dma_shared;
>> +
>> unsigned char instance_uuid[16];
>> /* Persistent write amplification counters, 4kb sector I/Os */
>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>> }
>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>> + struct pblk_sec_meta *meta,
>> + int index)
>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> + struct nvm_geo *geo = &dev->geo;
>> +
>> + return ((void *)meta + index * geo->sos);
>> +}
>> +
>> static inline int pblk_line_vsc(struct pblk_line *line)
>> {
>> return le32_to_cpu(*line->vsc);
>
> It will be helpful to split this patch into two:
>
> - One that does the 8b conversion
> - One that makes the change to merge metadata and ppa list data buffers
pblk has always shared the dma buffer for the ppa list and the metadata
buffer. This patch adds the possibility to not merge if the metadata
size does not fit. I can separate it in 2 patches, but it seems to me
like a natural extension when relaxing the 16byte metadata size.
> - How about making it a simplification to only allow up to 32b
> metadata buffers, then one doesn't have to think about it crossing
> multiple pages.
You mean max. 56 bytes of metadata per 4KB right? That is what is left
on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
by me, but I'd like to hear Igor's opinion, as this was Intel's use case
to start with.
> - You can also make a structure for it, use the first 3.5KB for meta,
> and the next 512B for PPAs.
Looks like a huge structure for just managing a couple of pointers,
don't you think?
Javier
On 11.09.2018 11:14, Javier Gonzalez wrote:
>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>
>> On 08/29/2018 12:09 PM, Javier González wrote:
>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>> through the out of band area to store the lba mapped to the given
>>> physical sector. This is used for recovery purposes. Given that the
>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>> structure for this purpose.
>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>> reported by the device to layout metadata appropriately for the vector
>>> commands. This adds support for arbitrary metadata sizes, as long as
>>> these are larger than 8 bytes. Note that this patch does not address the
>>> case in which the device does not expose an out of band area and that
>>> pblk creation will fail in this case.
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index a311cc29afd8..d52e0047ae9d 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>> if (!is_vector)
>>> return 0;
>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>> + if (pblk->dma_shared) {
>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>> +
>>> + return 0;
>>> + }
>>> +
>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>> + &rqd->dma_ppa_list);
>>> + if (!rqd->ppa_list) {
>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>> + rqd->dma_meta_list);
>>> + return -ENOMEM;
>>> + }
>>> return 0;
>>> }
>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>> if (rqd->meta_list)
>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>> - rqd->dma_meta_list);
>>> + rqd->dma_meta_list);
>>> +
>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>> + rqd->dma_ppa_list);
>>> }
>>> /* Caller must guarantee that the request is a valid type */
>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>> rqd.is_seq = 1;
>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>> + struct pblk_sec_meta *meta;
>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>> +
>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>> + meta->lba = lba_list[paddr] = addr_empty;
>>> }
>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> if (!meta_list)
>>> return -ENOMEM;
>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>> + if (pblk->dma_shared) {
>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>> + } else {
>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>> + &dma_ppa_list);
>>> + if (!ppa_list) {
>>> + ret = -ENOMEM;
>>> + goto free_meta_list;
>>> + }
>>> + }
>>> next_rq:
>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>> if (IS_ERR(bio)) {
>>> ret = PTR_ERR(bio);
>>> - goto free_rqd_dma;
>>> + goto free_ppa_list;
>>> }
>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>> bio_put(bio);
>>> ret = -EINTR;
>>> - goto free_rqd_dma;
>>> + goto free_ppa_list;
>>> }
>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>> bio_put(bio);
>>> ret = -EINTR;
>>> - goto free_rqd_dma;
>>> + goto free_ppa_list;
>>> }
>>> for (j = 0; j < min; j++, i++, paddr++)
>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> if (ret) {
>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>> bio_put(bio);
>>> - goto free_rqd_dma;
>>> + goto free_ppa_list;
>>> }
>>> atomic_dec(&pblk->inflight_io);
>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>> if (left_ppas)
>>> goto next_rq;
>>> -free_rqd_dma:
>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>> +free_ppa_list:
>>> + if (!pblk->dma_shared)
>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>> +free_meta_list:
>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>> return ret;
>>> }
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index a99854439224..57972156c318 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>> struct nvm_geo *geo = &dev->geo;
>>> int ret, max_write_ppas;
>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>> + return -EINTR;
>>> + }
>>> +
>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>> +
>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>> + pblk->dma_shared = false;
>>> + else
>>> + pblk->dma_shared = true;
>>> +
>>> atomic64_set(&pblk->user_wa, 0);
>>> atomic64_set(&pblk->pad_wa, 0);
>>> atomic64_set(&pblk->gc_wa, 0);
>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>> index dc0efb852475..55fca16d18e4 100644
>>> --- a/drivers/lightnvm/pblk-map.c
>>> +++ b/drivers/lightnvm/pblk-map.c
>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> unsigned int valid_secs)
>>> {
>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>> + struct pblk_sec_meta *meta;
>>> struct pblk_emeta *emeta;
>>> struct pblk_w_ctx *w_ctx;
>>> __le64 *lba_list;
>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> /* ppa to be sent to the device */
>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>> + meta = sec_meta_index(pblk, meta_list, i);
>>> +
>>> /* Write context for target bio completion on write buffer. Note
>>> * that the write buffer is protected by the sync backpointer,
>>> * and a single writer thread have access to each specific entry
>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>> kref_get(&line->ref);
>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>> w_ctx->ppa = ppa_list[i];
>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>> if (lba_list[paddr] != addr_empty)
>>> line->nr_valid_lbas++;
>>> else
>>> atomic64_inc(&pblk->pad_wa);
>>> } else {
>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>> + lba_list[paddr] = meta->lba = addr_empty;
>>> __pblk_map_invalidate(pblk, line, paddr);
>>> }
>>> }
>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>> unsigned int off)
>>> {
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> + struct pblk_sec_meta *meta_list;
>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>> unsigned int map_secs;
>>> int min = pblk->min_write_pgs;
>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>> +
>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>> + lun_bitmap, meta_list, map_secs)) {
>>> bio_put(rqd->bio);
>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>> pblk_pipeline_stop(pblk);
>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> struct nvm_geo *geo = &dev->geo;
>>> struct pblk_line_meta *lm = &pblk->lm;
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>> + struct pblk_sec_meta *meta_list;
>>> struct pblk_line *e_line, *d_line;
>>> unsigned int map_secs;
>>> int min = pblk->min_write_pgs;
>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>> +
>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>> + lun_bitmap, meta_list, map_secs)) {
>>> bio_put(rqd->bio);
>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>> pblk_pipeline_stop(pblk);
>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>> index 57d3155ef9a5..12b690e2abd9 100644
>>> --- a/drivers/lightnvm/pblk-read.c
>>> +++ b/drivers/lightnvm/pblk-read.c
>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> struct bio *bio, sector_t blba,
>>> unsigned long *read_bitmap)
>>> {
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>> int nr_secs = rqd->nr_ppas;
>>> bool advanced_bio = false;
>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>> for (i = 0; i < nr_secs; i++) {
>>> + struct pblk_sec_meta *meta;
>>> struct ppa_addr p = ppas[i];
>>> sector_t lba = blba + i;
>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>> retry:
>>> if (pblk_ppa_empty(p)) {
>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>> +
>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>> if (unlikely(!advanced_bio)) {
>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> goto retry;
>>> }
>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>> - meta_list[i].lba = cpu_to_le64(lba);
>>> + meta->lba = cpu_to_le64(lba);
>>> advanced_bio = true;
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> atomic_long_inc(&pblk->cache_reads);
>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>> sector_t blba)
>>> {
>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>> int nr_lbas = rqd->nr_ppas;
>>> int i;
>>> for (i = 0; i < nr_lbas; i++) {
>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>> + struct pblk_sec_meta *meta;
>>> + u64 lba;
>>> +
>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>> + lba = le64_to_cpu(meta->lba);
>>> if (lba == ADDR_EMPTY)
>>> continue;
>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>> u64 *lba_list, int nr_lbas)
>>> {
>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>> int i, j;
>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>> + struct pblk_sec_meta *meta;
>>> u64 lba = lba_list[i];
>>> u64 meta_lba;
>>> if (lba == ADDR_EMPTY)
>>> continue;
>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>> + meta_lba = le64_to_cpu(meta->lba);
>>> if (lba != meta_lba) {
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>> struct bio *new_bio = rqd->bio;
>>> struct bio *bio = pr_ctx->orig_bio;
>>> struct bio_vec src_bv, dst_bv;
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> + struct pblk_sec_meta *meta;
>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>> int nr_secs = pr_ctx->orig_nr_secs;
>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>> }
>>> /* Re-use allocated memory for intermediate lbas */
>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>> for (i = 0; i < nr_secs; i++) {
>>> - lba_list_media[i] = meta_list[i].lba;
>>> - meta_list[i].lba = lba_list_mem[i];
>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>> + lba_list_media[i] = meta->lba;
>>> + meta->lba = lba_list_mem[i];
>>> }
>>> /* Fill the holes in the original bio */
>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>> kref_put(&line->ref, pblk_line_put);
>>> - meta_list[hole].lba = lba_list_media[i];
>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>> + meta->lba = lba_list_media[i];
>>> src_bv = new_bio->bi_io_vec[i++];
>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> unsigned long *read_bitmap,
>>> int nr_holes)
>>> {
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>> + struct pblk_sec_meta *meta;
>>> struct pblk_pr_ctx *pr_ctx;
>>> struct bio *new_bio, *bio = r_ctx->private;
>>> __le64 *lba_list_mem;
>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> int i;
>>> /* Re-use allocated memory for intermediate lbas */
>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>> if (!pr_ctx)
>>> goto fail_free_pages;
>>> - for (i = 0; i < nr_secs; i++)
>>> - lba_list_mem[i] = meta_list[i].lba;
>>> + for (i = 0; i < nr_secs; i++) {
>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>> + lba_list_mem[i] = meta->lba;
>>> + }
>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> sector_t lba, unsigned long *read_bitmap)
>>> {
>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>> + struct pblk_sec_meta *meta;
>>> struct ppa_addr ppa;
>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> retry:
>>> if (pblk_ppa_empty(ppa)) {
>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>> +
>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>> +
>>> return;
>>> }
>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>> }
>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>> - meta_list[0].lba = cpu_to_le64(lba);
>>> +
>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>> + meta->lba = cpu_to_le64(lba);
>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>> atomic_long_inc(&pblk->cache_reads);
>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>> index 8114013c37b8..1ce92562603d 100644
>>> --- a/drivers/lightnvm/pblk-recovery.c
>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>> {
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> struct nvm_geo *geo = &dev->geo;
>>> - struct pblk_sec_meta *meta_list;
>>> + struct pblk_sec_meta *meta;
>>> struct pblk_pad_rq *pad_rq;
>>> struct nvm_rq *rqd;
>>> struct bio *bio;
>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>> rqd->end_io = pblk_end_io_recov;
>>> rqd->private = pad_rq;
>>> - meta_list = rqd->meta_list;
>>> -
>>> for (i = 0; i < rqd->nr_ppas; ) {
>>> struct ppa_addr ppa;
>>> int pos;
>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>> pblk_map_invalidate(pblk, dev_ppa);
>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>> rqd->ppa_list[i] = dev_ppa;
>>> +
>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>> }
>>> }
>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> struct nvm_geo *geo = &dev->geo;
>>> struct ppa_addr *ppa_list;
>>> - struct pblk_sec_meta *meta_list;
>>> + struct pblk_sec_meta *meta_list, *meta;
>>> struct nvm_rq *rqd;
>>> struct bio *bio;
>>> void *data;
>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>> }
>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>> + u64 lba;
>>> +
>>> + meta = sec_meta_index(pblk, meta_list, i);
>>> + lba = le64_to_cpu(meta->lba);
>>> lba_list[paddr++] = cpu_to_le64(lba);
>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>> if (!meta_list)
>>> return -ENOMEM;
>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>> + if (pblk->dma_shared) {
>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>> + } else {
>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>> + &dma_ppa_list);
>>> + if (!ppa_list) {
>>> + ret = -ENOMEM;
>>> + goto free_meta_list;
>>> + }
>>> + }
>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>> if (!data) {
>>> ret = -ENOMEM;
>>> - goto free_meta_list;
>>> + goto free_ppa_list;
>>> }
>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>> out:
>>> mempool_free(rqd, &pblk->r_rq_pool);
>>> kfree(data);
>>> +free_ppa_list:
>>> + if (!pblk->dma_shared)
>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>> free_meta_list:
>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>> -
>>> return ret;
>>> }
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index 22cc9bfbbb10..4526fee206d9 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -86,7 +86,6 @@ enum {
>>> };
>>> struct pblk_sec_meta {
>>> - u64 reserved;
>>> __le64 lba;
>>> };
>>> @@ -103,9 +102,6 @@ enum {
>>> PBLK_RL_LOW = 4
>>> };
>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>> -
>>> /* write buffer completion context */
>>> struct pblk_c_ctx {
>>> struct list_head list; /* Head for out-of-order completion */
>>> @@ -637,6 +633,10 @@ struct pblk {
>>> int sec_per_write;
>>> + int dma_meta_size;
>>> + int dma_ppa_size;
>>> + bool dma_shared;
>>> +
>>> unsigned char instance_uuid[16];
>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>> }
>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>> + struct pblk_sec_meta *meta,
>>> + int index)
>>> +{
>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>> + struct nvm_geo *geo = &dev->geo;
>>> +
>>> + return ((void *)meta + index * geo->sos);
>>> +}
>>> +
>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>> {
>>> return le32_to_cpu(*line->vsc);
>>
>> It will be helpful to split this patch into two:
>>
>> - One that does the 8b conversion
>> - One that makes the change to merge metadata and ppa list data buffers
>
> pblk has always shared the dma buffer for the ppa list and the metadata
> buffer. This patch adds the possibility to not merge if the metadata
> size does not fit. I can separate it in 2 patches, but it seems to me
> like a natural extension when relaxing the 16byte metadata size.
>
>> - How about making it a simplification to only allow up to 32b
>> metadata buffers, then one doesn't have to think about it crossing
>> multiple pages.
>
> You mean max. 56 bytes of metadata per 4KB right? That is what is left
> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
> to start with.
>
So I think that if we want to do this properly, we should support 'all'
the metadata sizes, including >64K - which is not the case after Javier
changes. In the past there were NVMe drives available with 128 bytes of
metadata per 4K sector, so it could be similar in future too for OC
drives. I have somewhere patch which should work for any size of
metadata with slightly different approach. I can send it (just need a
moment for rebase) and you will decide which approach you would like to
take.
>> - You can also make a structure for it, use the first 3.5KB for meta,
>> and the next 512B for PPAs.
>
> Looks like a huge structure for just managing a couple of pointers,
> don't you think?
>
> Javier
>
> On 11 Sep 2018, at 10.50, Javier Gonzalez <[email protected]> wrote:
>
>> On 7 Sep 2018, at 12.28, Matias Bjørling <[email protected]> wrote:
>>
>> On 08/29/2018 12:09 PM, Javier González wrote:
>>> In the OCSSD 2.0 spec., the sector and metadata sizes are reported though
>>> the standard nvme identify command. Thus, the lightnvm subsystem needs
>>> to update this information on the geometry structure on bootup.
>>> Since 1.2 devices report these values on the OCSSD geometry identify,
>>> avoid this update is it is unnecessary and can also corrupt the geometry
>>> if the devices does not report the nvme sizes correctly (which is not
>>> required by the OCSSD 1.2 spec either) >
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/nvme/host/lightnvm.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>> index 5bfa354c5dd5..33ed09f8410e 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -980,6 +980,9 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>>> struct nvm_dev *ndev = ns->ndev;
>>> struct nvm_geo *geo = &ndev->geo;
>>> + if (geo->version == NVM_OCSSD_SPEC_12)
>>> + return;
>>
>> The lba format sizes are not explicit in 2.0 either. For a conforming drive, it should implement the NVM Command Set and expose the LBA format correctly. Although, I do get your incentive, and if it is okay with you, I'll reword the commit message to this and apply it?:
>>
>> "1.2 devices exposes their data and metadata size through the separate
>> identify command. Make sure that the NVMe LBA format does not override
>> these values."
>
> Sounds good. I'll resend either way as I rebased on top of for-4.20/core
> - you can change the commit as you think it suits.
Since Igor will send a patch for the OOB, can you pick this up
separately? No changes needed after rebase.
Thanks,
Javier
> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>
>
>
> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>> through the out of band area to store the lba mapped to the given
>>>> physical sector. This is used for recovery purposes. Given that the
>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>> structure for this purpose.
>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>> reported by the device to layout metadata appropriately for the vector
>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>> case in which the device does not expose an out of band area and that
>>>> pblk creation will fail in this case.
>>>> Signed-off-by: Javier González <[email protected]>
>>>> ---
>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>> --- a/drivers/lightnvm/pblk-core.c
>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>> if (!is_vector)
>>>> return 0;
>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>> + if (pblk->dma_shared) {
>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>> +
>>>> + return 0;
>>>> + }
>>>> +
>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>> + &rqd->dma_ppa_list);
>>>> + if (!rqd->ppa_list) {
>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>> + rqd->dma_meta_list);
>>>> + return -ENOMEM;
>>>> + }
>>>> return 0;
>>>> }
>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>> if (rqd->meta_list)
>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>> - rqd->dma_meta_list);
>>>> + rqd->dma_meta_list);
>>>> +
>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>> + rqd->dma_ppa_list);
>>>> }
>>>> /* Caller must guarantee that the request is a valid type */
>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>> rqd.is_seq = 1;
>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>> + struct pblk_sec_meta *meta;
>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>> +
>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>> }
>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> if (!meta_list)
>>>> return -ENOMEM;
>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>> + if (pblk->dma_shared) {
>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>> + } else {
>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>> + &dma_ppa_list);
>>>> + if (!ppa_list) {
>>>> + ret = -ENOMEM;
>>>> + goto free_meta_list;
>>>> + }
>>>> + }
>>>> next_rq:
>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>> if (IS_ERR(bio)) {
>>>> ret = PTR_ERR(bio);
>>>> - goto free_rqd_dma;
>>>> + goto free_ppa_list;
>>>> }
>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>> bio_put(bio);
>>>> ret = -EINTR;
>>>> - goto free_rqd_dma;
>>>> + goto free_ppa_list;
>>>> }
>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>> bio_put(bio);
>>>> ret = -EINTR;
>>>> - goto free_rqd_dma;
>>>> + goto free_ppa_list;
>>>> }
>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> if (ret) {
>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>> bio_put(bio);
>>>> - goto free_rqd_dma;
>>>> + goto free_ppa_list;
>>>> }
>>>> atomic_dec(&pblk->inflight_io);
>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>> if (left_ppas)
>>>> goto next_rq;
>>>> -free_rqd_dma:
>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>> +free_ppa_list:
>>>> + if (!pblk->dma_shared)
>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>> +free_meta_list:
>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>> return ret;
>>>> }
>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>> index a99854439224..57972156c318 100644
>>>> --- a/drivers/lightnvm/pblk-init.c
>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>> struct nvm_geo *geo = &dev->geo;
>>>> int ret, max_write_ppas;
>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>> + return -EINTR;
>>>> + }
>>>> +
>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>> +
>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>> + pblk->dma_shared = false;
>>>> + else
>>>> + pblk->dma_shared = true;
>>>> +
>>>> atomic64_set(&pblk->user_wa, 0);
>>>> atomic64_set(&pblk->pad_wa, 0);
>>>> atomic64_set(&pblk->gc_wa, 0);
>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>> index dc0efb852475..55fca16d18e4 100644
>>>> --- a/drivers/lightnvm/pblk-map.c
>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>> unsigned int valid_secs)
>>>> {
>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>> + struct pblk_sec_meta *meta;
>>>> struct pblk_emeta *emeta;
>>>> struct pblk_w_ctx *w_ctx;
>>>> __le64 *lba_list;
>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>> /* ppa to be sent to the device */
>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>> +
>>>> /* Write context for target bio completion on write buffer. Note
>>>> * that the write buffer is protected by the sync backpointer,
>>>> * and a single writer thread have access to each specific entry
>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>> kref_get(&line->ref);
>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>> w_ctx->ppa = ppa_list[i];
>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>> if (lba_list[paddr] != addr_empty)
>>>> line->nr_valid_lbas++;
>>>> else
>>>> atomic64_inc(&pblk->pad_wa);
>>>> } else {
>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>> }
>>>> }
>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>> unsigned int off)
>>>> {
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> + struct pblk_sec_meta *meta_list;
>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>> unsigned int map_secs;
>>>> int min = pblk->min_write_pgs;
>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>> +
>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>> + lun_bitmap, meta_list, map_secs)) {
>>>> bio_put(rqd->bio);
>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>> pblk_pipeline_stop(pblk);
>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>> struct nvm_geo *geo = &dev->geo;
>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>> + struct pblk_sec_meta *meta_list;
>>>> struct pblk_line *e_line, *d_line;
>>>> unsigned int map_secs;
>>>> int min = pblk->min_write_pgs;
>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>> +
>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>> + lun_bitmap, meta_list, map_secs)) {
>>>> bio_put(rqd->bio);
>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>> pblk_pipeline_stop(pblk);
>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>> --- a/drivers/lightnvm/pblk-read.c
>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> struct bio *bio, sector_t blba,
>>>> unsigned long *read_bitmap)
>>>> {
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>> int nr_secs = rqd->nr_ppas;
>>>> bool advanced_bio = false;
>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>> for (i = 0; i < nr_secs; i++) {
>>>> + struct pblk_sec_meta *meta;
>>>> struct ppa_addr p = ppas[i];
>>>> sector_t lba = blba + i;
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>> retry:
>>>> if (pblk_ppa_empty(p)) {
>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>> +
>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>> if (unlikely(!advanced_bio)) {
>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> goto retry;
>>>> }
>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>> + meta->lba = cpu_to_le64(lba);
>>>> advanced_bio = true;
>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>> atomic_long_inc(&pblk->cache_reads);
>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> sector_t blba)
>>>> {
>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>> int nr_lbas = rqd->nr_ppas;
>>>> int i;
>>>> for (i = 0; i < nr_lbas; i++) {
>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>> + struct pblk_sec_meta *meta;
>>>> + u64 lba;
>>>> +
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>> + lba = le64_to_cpu(meta->lba);
>>>> if (lba == ADDR_EMPTY)
>>>> continue;
>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>> u64 *lba_list, int nr_lbas)
>>>> {
>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>> int i, j;
>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>> + struct pblk_sec_meta *meta;
>>>> u64 lba = lba_list[i];
>>>> u64 meta_lba;
>>>> if (lba == ADDR_EMPTY)
>>>> continue;
>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>> if (lba != meta_lba) {
>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>> struct bio *new_bio = rqd->bio;
>>>> struct bio *bio = pr_ctx->orig_bio;
>>>> struct bio_vec src_bv, dst_bv;
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> + struct pblk_sec_meta *meta;
>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>> }
>>>> /* Re-use allocated memory for intermediate lbas */
>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>> for (i = 0; i < nr_secs; i++) {
>>>> - lba_list_media[i] = meta_list[i].lba;
>>>> - meta_list[i].lba = lba_list_mem[i];
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>> + lba_list_media[i] = meta->lba;
>>>> + meta->lba = lba_list_mem[i];
>>>> }
>>>> /* Fill the holes in the original bio */
>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>> kref_put(&line->ref, pblk_line_put);
>>>> - meta_list[hole].lba = lba_list_media[i];
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>> + meta->lba = lba_list_media[i];
>>>> src_bv = new_bio->bi_io_vec[i++];
>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>> unsigned long *read_bitmap,
>>>> int nr_holes)
>>>> {
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>> + struct pblk_sec_meta *meta;
>>>> struct pblk_pr_ctx *pr_ctx;
>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>> __le64 *lba_list_mem;
>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>> int i;
>>>> /* Re-use allocated memory for intermediate lbas */
>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>> if (!pr_ctx)
>>>> goto fail_free_pages;
>>>> - for (i = 0; i < nr_secs; i++)
>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>> + for (i = 0; i < nr_secs; i++) {
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>> + lba_list_mem[i] = meta->lba;
>>>> + }
>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>> sector_t lba, unsigned long *read_bitmap)
>>>> {
>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>> + struct pblk_sec_meta *meta;
>>>> struct ppa_addr ppa;
>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>> retry:
>>>> if (pblk_ppa_empty(ppa)) {
>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>> +
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>> +
>>>> return;
>>>> }
>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>> }
>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>> +
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>> + meta->lba = cpu_to_le64(lba);
>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>> atomic_long_inc(&pblk->cache_reads);
>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>> index 8114013c37b8..1ce92562603d 100644
>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>> {
>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>> struct nvm_geo *geo = &dev->geo;
>>>> - struct pblk_sec_meta *meta_list;
>>>> + struct pblk_sec_meta *meta;
>>>> struct pblk_pad_rq *pad_rq;
>>>> struct nvm_rq *rqd;
>>>> struct bio *bio;
>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>> rqd->end_io = pblk_end_io_recov;
>>>> rqd->private = pad_rq;
>>>> - meta_list = rqd->meta_list;
>>>> -
>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>> struct ppa_addr ppa;
>>>> int pos;
>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>> rqd->ppa_list[i] = dev_ppa;
>>>> +
>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>> }
>>>> }
>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>> struct nvm_geo *geo = &dev->geo;
>>>> struct ppa_addr *ppa_list;
>>>> - struct pblk_sec_meta *meta_list;
>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>> struct nvm_rq *rqd;
>>>> struct bio *bio;
>>>> void *data;
>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>> }
>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>> + u64 lba;
>>>> +
>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>> + lba = le64_to_cpu(meta->lba);
>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>> if (!meta_list)
>>>> return -ENOMEM;
>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>> + if (pblk->dma_shared) {
>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>> + } else {
>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>> + &dma_ppa_list);
>>>> + if (!ppa_list) {
>>>> + ret = -ENOMEM;
>>>> + goto free_meta_list;
>>>> + }
>>>> + }
>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>> if (!data) {
>>>> ret = -ENOMEM;
>>>> - goto free_meta_list;
>>>> + goto free_ppa_list;
>>>> }
>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>> out:
>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>> kfree(data);
>>>> +free_ppa_list:
>>>> + if (!pblk->dma_shared)
>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>> free_meta_list:
>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>> -
>>>> return ret;
>>>> }
>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>> --- a/drivers/lightnvm/pblk.h
>>>> +++ b/drivers/lightnvm/pblk.h
>>>> @@ -86,7 +86,6 @@ enum {
>>>> };
>>>> struct pblk_sec_meta {
>>>> - u64 reserved;
>>>> __le64 lba;
>>>> };
>>>> @@ -103,9 +102,6 @@ enum {
>>>> PBLK_RL_LOW = 4
>>>> };
>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>> -
>>>> /* write buffer completion context */
>>>> struct pblk_c_ctx {
>>>> struct list_head list; /* Head for out-of-order completion */
>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>> int sec_per_write;
>>>> + int dma_meta_size;
>>>> + int dma_ppa_size;
>>>> + bool dma_shared;
>>>> +
>>>> unsigned char instance_uuid[16];
>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>> }
>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>> + struct pblk_sec_meta *meta,
>>>> + int index)
>>>> +{
>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>> + struct nvm_geo *geo = &dev->geo;
>>>> +
>>>> + return ((void *)meta + index * geo->sos);
>>>> +}
>>>> +
>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>> {
>>>> return le32_to_cpu(*line->vsc);
>>>
>>> It will be helpful to split this patch into two:
>>>
>>> - One that does the 8b conversion
>>> - One that makes the change to merge metadata and ppa list data buffers
>> pblk has always shared the dma buffer for the ppa list and the metadata
>> buffer. This patch adds the possibility to not merge if the metadata
>> size does not fit. I can separate it in 2 patches, but it seems to me
>> like a natural extension when relaxing the 16byte metadata size.
>>> - How about making it a simplification to only allow up to 32b
>>> metadata buffers, then one doesn't have to think about it crossing
>>> multiple pages.
>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>> to start with.
>
> So I think that if we want to do this properly, we should support
> 'all' the metadata sizes, including >64K - which is not the case after
> Javier changes. In the past there were NVMe drives available with 128
> bytes of metadata per 4K sector, so it could be similar in future too
> for OC drives. I have somewhere patch which should work for any size
> of metadata with slightly different approach. I can send it (just need
> a moment for rebase) and you will decide which approach you would like
> to take.
>
>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>> and the next 512B for PPAs.
>> Looks like a huge structure for just managing a couple of pointers,
>> don't you think?
>> Javier
Matias,
Can you pick up this patch as is for this window and then Igor can extend
it to an arbitrary size in the future? We have a use case for 64B OOB /
4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
the metadata buffer, the code brakes under this configuration.
If you have any comments that can respect this requirement, I can apply
them and resubmit.
Thanks,
Javier
On 09/18/2018 10:09 AM, Javier Gonzalez wrote:
>> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>>
>>
>>
>> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>>
>>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>>> through the out of band area to store the lba mapped to the given
>>>>> physical sector. This is used for recovery purposes. Given that the
>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>>> structure for this purpose.
>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>>> reported by the device to layout metadata appropriately for the vector
>>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>>> case in which the device does not expose an out of band area and that
>>>>> pblk creation will fail in this case.
>>>>> Signed-off-by: Javier González <[email protected]>
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>> if (!is_vector)
>>>>> return 0;
>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>> + if (pblk->dma_shared) {
>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>>> +
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>> + &rqd->dma_ppa_list);
>>>>> + if (!rqd->ppa_list) {
>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>> + rqd->dma_meta_list);
>>>>> + return -ENOMEM;
>>>>> + }
>>>>> return 0;
>>>>> }
>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>>> if (rqd->meta_list)
>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>> - rqd->dma_meta_list);
>>>>> + rqd->dma_meta_list);
>>>>> +
>>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>>> + rqd->dma_ppa_list);
>>>>> }
>>>>> /* Caller must guarantee that the request is a valid type */
>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>>> rqd.is_seq = 1;
>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>>> + struct pblk_sec_meta *meta;
>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>>> +
>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>>> }
>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> if (!meta_list)
>>>>> return -ENOMEM;
>>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>> + if (pblk->dma_shared) {
>>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>> + } else {
>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>> + &dma_ppa_list);
>>>>> + if (!ppa_list) {
>>>>> + ret = -ENOMEM;
>>>>> + goto free_meta_list;
>>>>> + }
>>>>> + }
>>>>> next_rq:
>>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>>> if (IS_ERR(bio)) {
>>>>> ret = PTR_ERR(bio);
>>>>> - goto free_rqd_dma;
>>>>> + goto free_ppa_list;
>>>>> }
>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>>> bio_put(bio);
>>>>> ret = -EINTR;
>>>>> - goto free_rqd_dma;
>>>>> + goto free_ppa_list;
>>>>> }
>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>>> bio_put(bio);
>>>>> ret = -EINTR;
>>>>> - goto free_rqd_dma;
>>>>> + goto free_ppa_list;
>>>>> }
>>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> if (ret) {
>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>>> bio_put(bio);
>>>>> - goto free_rqd_dma;
>>>>> + goto free_ppa_list;
>>>>> }
>>>>> atomic_dec(&pblk->inflight_io);
>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>> if (left_ppas)
>>>>> goto next_rq;
>>>>> -free_rqd_dma:
>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>>> +free_ppa_list:
>>>>> + if (!pblk->dma_shared)
>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>> +free_meta_list:
>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>> return ret;
>>>>> }
>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>> index a99854439224..57972156c318 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>>> struct nvm_geo *geo = &dev->geo;
>>>>> int ret, max_write_ppas;
>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>>> + return -EINTR;
>>>>> + }
>>>>> +
>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>>> +
>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>>> + pblk->dma_shared = false;
>>>>> + else
>>>>> + pblk->dma_shared = true;
>>>>> +
>>>>> atomic64_set(&pblk->user_wa, 0);
>>>>> atomic64_set(&pblk->pad_wa, 0);
>>>>> atomic64_set(&pblk->gc_wa, 0);
>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>> index dc0efb852475..55fca16d18e4 100644
>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>> unsigned int valid_secs)
>>>>> {
>>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>>> + struct pblk_sec_meta *meta;
>>>>> struct pblk_emeta *emeta;
>>>>> struct pblk_w_ctx *w_ctx;
>>>>> __le64 *lba_list;
>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>> /* ppa to be sent to the device */
>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>> +
>>>>> /* Write context for target bio completion on write buffer. Note
>>>>> * that the write buffer is protected by the sync backpointer,
>>>>> * and a single writer thread have access to each specific entry
>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>> kref_get(&line->ref);
>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>>> w_ctx->ppa = ppa_list[i];
>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>>> if (lba_list[paddr] != addr_empty)
>>>>> line->nr_valid_lbas++;
>>>>> else
>>>>> atomic64_inc(&pblk->pad_wa);
>>>>> } else {
>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>>> }
>>>>> }
>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>>> unsigned int off)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> + struct pblk_sec_meta *meta_list;
>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>> unsigned int map_secs;
>>>>> int min = pblk->min_write_pgs;
>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> +
>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>> bio_put(rqd->bio);
>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>> pblk_pipeline_stop(pblk);
>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>> struct nvm_geo *geo = &dev->geo;
>>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>> + struct pblk_sec_meta *meta_list;
>>>>> struct pblk_line *e_line, *d_line;
>>>>> unsigned int map_secs;
>>>>> int min = pblk->min_write_pgs;
>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> +
>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>> bio_put(rqd->bio);
>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>> pblk_pipeline_stop(pblk);
>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> struct bio *bio, sector_t blba,
>>>>> unsigned long *read_bitmap)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>>> int nr_secs = rqd->nr_ppas;
>>>>> bool advanced_bio = false;
>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>>> for (i = 0; i < nr_secs; i++) {
>>>>> + struct pblk_sec_meta *meta;
>>>>> struct ppa_addr p = ppas[i];
>>>>> sector_t lba = blba + i;
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> retry:
>>>>> if (pblk_ppa_empty(p)) {
>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>>> +
>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>> if (unlikely(!advanced_bio)) {
>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> goto retry;
>>>>> }
>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>>> + meta->lba = cpu_to_le64(lba);
>>>>> advanced_bio = true;
>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> sector_t blba)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>> int nr_lbas = rqd->nr_ppas;
>>>>> int i;
>>>>> for (i = 0; i < nr_lbas; i++) {
>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>>> + struct pblk_sec_meta *meta;
>>>>> + u64 lba;
>>>>> +
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> + lba = le64_to_cpu(meta->lba);
>>>>> if (lba == ADDR_EMPTY)
>>>>> continue;
>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> u64 *lba_list, int nr_lbas)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>> int i, j;
>>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>>> + struct pblk_sec_meta *meta;
>>>>> u64 lba = lba_list[i];
>>>>> u64 meta_lba;
>>>>> if (lba == ADDR_EMPTY)
>>>>> continue;
>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>>> if (lba != meta_lba) {
>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>> struct bio *new_bio = rqd->bio;
>>>>> struct bio *bio = pr_ctx->orig_bio;
>>>>> struct bio_vec src_bv, dst_bv;
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> + struct pblk_sec_meta *meta;
>>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>> }
>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>>> for (i = 0; i < nr_secs; i++) {
>>>>> - lba_list_media[i] = meta_list[i].lba;
>>>>> - meta_list[i].lba = lba_list_mem[i];
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> + lba_list_media[i] = meta->lba;
>>>>> + meta->lba = lba_list_mem[i];
>>>>> }
>>>>> /* Fill the holes in the original bio */
>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>>> kref_put(&line->ref, pblk_line_put);
>>>>> - meta_list[hole].lba = lba_list_media[i];
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>>> + meta->lba = lba_list_media[i];
>>>>> src_bv = new_bio->bi_io_vec[i++];
>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> unsigned long *read_bitmap,
>>>>> int nr_holes)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>> + struct pblk_sec_meta *meta;
>>>>> struct pblk_pr_ctx *pr_ctx;
>>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>>> __le64 *lba_list_mem;
>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> int i;
>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> if (!pr_ctx)
>>>>> goto fail_free_pages;
>>>>> - for (i = 0; i < nr_secs; i++)
>>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>>> + for (i = 0; i < nr_secs; i++) {
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> + lba_list_mem[i] = meta->lba;
>>>>> + }
>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>> sector_t lba, unsigned long *read_bitmap)
>>>>> {
>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>> + struct pblk_sec_meta *meta;
>>>>> struct ppa_addr ppa;
>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>> retry:
>>>>> if (pblk_ppa_empty(ppa)) {
>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>>> +
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>> +
>>>>> return;
>>>>> }
>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>> }
>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>>> +
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>> + meta->lba = cpu_to_le64(lba);
>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>> index 8114013c37b8..1ce92562603d 100644
>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>> {
>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>> struct nvm_geo *geo = &dev->geo;
>>>>> - struct pblk_sec_meta *meta_list;
>>>>> + struct pblk_sec_meta *meta;
>>>>> struct pblk_pad_rq *pad_rq;
>>>>> struct nvm_rq *rqd;
>>>>> struct bio *bio;
>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>> rqd->end_io = pblk_end_io_recov;
>>>>> rqd->private = pad_rq;
>>>>> - meta_list = rqd->meta_list;
>>>>> -
>>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>>> struct ppa_addr ppa;
>>>>> int pos;
>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>>> rqd->ppa_list[i] = dev_ppa;
>>>>> +
>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>>> }
>>>>> }
>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>> struct nvm_geo *geo = &dev->geo;
>>>>> struct ppa_addr *ppa_list;
>>>>> - struct pblk_sec_meta *meta_list;
>>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>>> struct nvm_rq *rqd;
>>>>> struct bio *bio;
>>>>> void *data;
>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>> }
>>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>>> + u64 lba;
>>>>> +
>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>> + lba = le64_to_cpu(meta->lba);
>>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>> if (!meta_list)
>>>>> return -ENOMEM;
>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>> + if (pblk->dma_shared) {
>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>> + } else {
>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>> + &dma_ppa_list);
>>>>> + if (!ppa_list) {
>>>>> + ret = -ENOMEM;
>>>>> + goto free_meta_list;
>>>>> + }
>>>>> + }
>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>>> if (!data) {
>>>>> ret = -ENOMEM;
>>>>> - goto free_meta_list;
>>>>> + goto free_ppa_list;
>>>>> }
>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>> out:
>>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>>> kfree(data);
>>>>> +free_ppa_list:
>>>>> + if (!pblk->dma_shared)
>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>> free_meta_list:
>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>> -
>>>>> return ret;
>>>>> }
>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>>> --- a/drivers/lightnvm/pblk.h
>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>> @@ -86,7 +86,6 @@ enum {
>>>>> };
>>>>> struct pblk_sec_meta {
>>>>> - u64 reserved;
>>>>> __le64 lba;
>>>>> };
>>>>> @@ -103,9 +102,6 @@ enum {
>>>>> PBLK_RL_LOW = 4
>>>>> };
>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>>> -
>>>>> /* write buffer completion context */
>>>>> struct pblk_c_ctx {
>>>>> struct list_head list; /* Head for out-of-order completion */
>>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>>> int sec_per_write;
>>>>> + int dma_meta_size;
>>>>> + int dma_ppa_size;
>>>>> + bool dma_shared;
>>>>> +
>>>>> unsigned char instance_uuid[16];
>>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>>> }
>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>>> + struct pblk_sec_meta *meta,
>>>>> + int index)
>>>>> +{
>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>> + struct nvm_geo *geo = &dev->geo;
>>>>> +
>>>>> + return ((void *)meta + index * geo->sos);
>>>>> +}
>>>>> +
>>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>>> {
>>>>> return le32_to_cpu(*line->vsc);
>>>>
>>>> It will be helpful to split this patch into two:
>>>>
>>>> - One that does the 8b conversion
>>>> - One that makes the change to merge metadata and ppa list data buffers
>>> pblk has always shared the dma buffer for the ppa list and the metadata
>>> buffer. This patch adds the possibility to not merge if the metadata
>>> size does not fit. I can separate it in 2 patches, but it seems to me
>>> like a natural extension when relaxing the 16byte metadata size.
>>>> - How about making it a simplification to only allow up to 32b
>>>> metadata buffers, then one doesn't have to think about it crossing
>>>> multiple pages.
>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>>> to start with.
>>
>> So I think that if we want to do this properly, we should support
>> 'all' the metadata sizes, including >64K - which is not the case after
>> Javier changes. In the past there were NVMe drives available with 128
>> bytes of metadata per 4K sector, so it could be similar in future too
>> for OC drives. I have somewhere patch which should work for any size
>> of metadata with slightly different approach. I can send it (just need
>> a moment for rebase) and you will decide which approach you would like
>> to take.
>>
>>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>>> and the next 512B for PPAs.
>>> Looks like a huge structure for just managing a couple of pointers,
>>> don't you think?
>>> Javier
>
>
> Matias,
>
> Can you pick up this patch as is for this window and then Igor can extend
> it to an arbitrary size in the future? We have a use case for 64B OOB /
> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
> the metadata buffer, the code brakes under this configuration.
>
> If you have any comments that can respect this requirement, I can apply
> them and resubmit.
>
Igor, you said you only needed a moment. It would be better to fix this
up the right way, and still make it for this window.
> Thanks,
> Javier
>
> On 18 Sep 2018, at 10.31, Matias Bjørling <[email protected]> wrote:
>
> On 09/18/2018 10:09 AM, Javier Gonzalez wrote:
>>> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>>>
>>>
>>>
>>> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>>>
>>>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>>>> through the out of band area to store the lba mapped to the given
>>>>>> physical sector. This is used for recovery purposes. Given that the
>>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>>>> structure for this purpose.
>>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>>>> reported by the device to layout metadata appropriately for the vector
>>>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>>>> case in which the device does not expose an out of band area and that
>>>>>> pblk creation will fail in this case.
>>>>>> Signed-off-by: Javier González <[email protected]>
>>>>>> ---
>>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>>> if (!is_vector)
>>>>>> return 0;
>>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>>> + if (pblk->dma_shared) {
>>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>>>> +
>>>>>> + return 0;
>>>>>> + }
>>>>>> +
>>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>>> + &rqd->dma_ppa_list);
>>>>>> + if (!rqd->ppa_list) {
>>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>> + rqd->dma_meta_list);
>>>>>> + return -ENOMEM;
>>>>>> + }
>>>>>> return 0;
>>>>>> }
>>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>>>> if (rqd->meta_list)
>>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>> - rqd->dma_meta_list);
>>>>>> + rqd->dma_meta_list);
>>>>>> +
>>>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>>>> + rqd->dma_ppa_list);
>>>>>> }
>>>>>> /* Caller must guarantee that the request is a valid type */
>>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>>>> rqd.is_seq = 1;
>>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>>>> }
>>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> if (!meta_list)
>>>>>> return -ENOMEM;
>>>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>> + if (pblk->dma_shared) {
>>>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>> + } else {
>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>> + &dma_ppa_list);
>>>>>> + if (!ppa_list) {
>>>>>> + ret = -ENOMEM;
>>>>>> + goto free_meta_list;
>>>>>> + }
>>>>>> + }
>>>>>> next_rq:
>>>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>>>> if (IS_ERR(bio)) {
>>>>>> ret = PTR_ERR(bio);
>>>>>> - goto free_rqd_dma;
>>>>>> + goto free_ppa_list;
>>>>>> }
>>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>>>> bio_put(bio);
>>>>>> ret = -EINTR;
>>>>>> - goto free_rqd_dma;
>>>>>> + goto free_ppa_list;
>>>>>> }
>>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>>>> bio_put(bio);
>>>>>> ret = -EINTR;
>>>>>> - goto free_rqd_dma;
>>>>>> + goto free_ppa_list;
>>>>>> }
>>>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> if (ret) {
>>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>>>> bio_put(bio);
>>>>>> - goto free_rqd_dma;
>>>>>> + goto free_ppa_list;
>>>>>> }
>>>>>> atomic_dec(&pblk->inflight_io);
>>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>> if (left_ppas)
>>>>>> goto next_rq;
>>>>>> -free_rqd_dma:
>>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>>>> +free_ppa_list:
>>>>>> + if (!pblk->dma_shared)
>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>> +free_meta_list:
>>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>> return ret;
>>>>>> }
>>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>>> index a99854439224..57972156c318 100644
>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>> int ret, max_write_ppas;
>>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>>>> + return -EINTR;
>>>>>> + }
>>>>>> +
>>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>>>> +
>>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>>>> + pblk->dma_shared = false;
>>>>>> + else
>>>>>> + pblk->dma_shared = true;
>>>>>> +
>>>>>> atomic64_set(&pblk->user_wa, 0);
>>>>>> atomic64_set(&pblk->pad_wa, 0);
>>>>>> atomic64_set(&pblk->gc_wa, 0);
>>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>>> index dc0efb852475..55fca16d18e4 100644
>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>> unsigned int valid_secs)
>>>>>> {
>>>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> struct pblk_emeta *emeta;
>>>>>> struct pblk_w_ctx *w_ctx;
>>>>>> __le64 *lba_list;
>>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>> /* ppa to be sent to the device */
>>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>> +
>>>>>> /* Write context for target bio completion on write buffer. Note
>>>>>> * that the write buffer is protected by the sync backpointer,
>>>>>> * and a single writer thread have access to each specific entry
>>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>> kref_get(&line->ref);
>>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>>>> w_ctx->ppa = ppa_list[i];
>>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>>>> if (lba_list[paddr] != addr_empty)
>>>>>> line->nr_valid_lbas++;
>>>>>> else
>>>>>> atomic64_inc(&pblk->pad_wa);
>>>>>> } else {
>>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>>>> }
>>>>>> }
>>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>>>> unsigned int off)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>> unsigned int map_secs;
>>>>>> int min = pblk->min_write_pgs;
>>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> +
>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>> bio_put(rqd->bio);
>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>> pblk_pipeline_stop(pblk);
>>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>> struct pblk_line *e_line, *d_line;
>>>>>> unsigned int map_secs;
>>>>>> int min = pblk->min_write_pgs;
>>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> +
>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>> bio_put(rqd->bio);
>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>> pblk_pipeline_stop(pblk);
>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> struct bio *bio, sector_t blba,
>>>>>> unsigned long *read_bitmap)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>>>> int nr_secs = rqd->nr_ppas;
>>>>>> bool advanced_bio = false;
>>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> struct ppa_addr p = ppas[i];
>>>>>> sector_t lba = blba + i;
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> retry:
>>>>>> if (pblk_ppa_empty(p)) {
>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>> +
>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>> if (unlikely(!advanced_bio)) {
>>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> goto retry;
>>>>>> }
>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>> advanced_bio = true;
>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> sector_t blba)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>> int nr_lbas = rqd->nr_ppas;
>>>>>> int i;
>>>>>> for (i = 0; i < nr_lbas; i++) {
>>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> + u64 lba;
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>> if (lba == ADDR_EMPTY)
>>>>>> continue;
>>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> u64 *lba_list, int nr_lbas)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>> int i, j;
>>>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> u64 lba = lba_list[i];
>>>>>> u64 meta_lba;
>>>>>> if (lba == ADDR_EMPTY)
>>>>>> continue;
>>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>>>> if (lba != meta_lba) {
>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>> struct bio *new_bio = rqd->bio;
>>>>>> struct bio *bio = pr_ctx->orig_bio;
>>>>>> struct bio_vec src_bv, dst_bv;
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>> }
>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>> - lba_list_media[i] = meta_list[i].lba;
>>>>>> - meta_list[i].lba = lba_list_mem[i];
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> + lba_list_media[i] = meta->lba;
>>>>>> + meta->lba = lba_list_mem[i];
>>>>>> }
>>>>>> /* Fill the holes in the original bio */
>>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>>>> kref_put(&line->ref, pblk_line_put);
>>>>>> - meta_list[hole].lba = lba_list_media[i];
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>>>> + meta->lba = lba_list_media[i];
>>>>>> src_bv = new_bio->bi_io_vec[i++];
>>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> unsigned long *read_bitmap,
>>>>>> int nr_holes)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> struct pblk_pr_ctx *pr_ctx;
>>>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>>>> __le64 *lba_list_mem;
>>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> int i;
>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> if (!pr_ctx)
>>>>>> goto fail_free_pages;
>>>>>> - for (i = 0; i < nr_secs; i++)
>>>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>>>> + for (i = 0; i < nr_secs; i++) {
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> + lba_list_mem[i] = meta->lba;
>>>>>> + }
>>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>> sector_t lba, unsigned long *read_bitmap)
>>>>>> {
>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> struct ppa_addr ppa;
>>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>> retry:
>>>>>> if (pblk_ppa_empty(ppa)) {
>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>> +
>>>>>> return;
>>>>>> }
>>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>> }
>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>>> index 8114013c37b8..1ce92562603d 100644
>>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>> {
>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>> + struct pblk_sec_meta *meta;
>>>>>> struct pblk_pad_rq *pad_rq;
>>>>>> struct nvm_rq *rqd;
>>>>>> struct bio *bio;
>>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>> rqd->end_io = pblk_end_io_recov;
>>>>>> rqd->private = pad_rq;
>>>>>> - meta_list = rqd->meta_list;
>>>>>> -
>>>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>>>> struct ppa_addr ppa;
>>>>>> int pos;
>>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>>>> rqd->ppa_list[i] = dev_ppa;
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>>>> }
>>>>>> }
>>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>> struct ppa_addr *ppa_list;
>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>>>> struct nvm_rq *rqd;
>>>>>> struct bio *bio;
>>>>>> void *data;
>>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>> }
>>>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>>>> + u64 lba;
>>>>>> +
>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>> if (!meta_list)
>>>>>> return -ENOMEM;
>>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>> + if (pblk->dma_shared) {
>>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>> + } else {
>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>> + &dma_ppa_list);
>>>>>> + if (!ppa_list) {
>>>>>> + ret = -ENOMEM;
>>>>>> + goto free_meta_list;
>>>>>> + }
>>>>>> + }
>>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>>>> if (!data) {
>>>>>> ret = -ENOMEM;
>>>>>> - goto free_meta_list;
>>>>>> + goto free_ppa_list;
>>>>>> }
>>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>> out:
>>>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>>>> kfree(data);
>>>>>> +free_ppa_list:
>>>>>> + if (!pblk->dma_shared)
>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>> free_meta_list:
>>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>> -
>>>>>> return ret;
>>>>>> }
>>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>>>> --- a/drivers/lightnvm/pblk.h
>>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>>> @@ -86,7 +86,6 @@ enum {
>>>>>> };
>>>>>> struct pblk_sec_meta {
>>>>>> - u64 reserved;
>>>>>> __le64 lba;
>>>>>> };
>>>>>> @@ -103,9 +102,6 @@ enum {
>>>>>> PBLK_RL_LOW = 4
>>>>>> };
>>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>>>> -
>>>>>> /* write buffer completion context */
>>>>>> struct pblk_c_ctx {
>>>>>> struct list_head list; /* Head for out-of-order completion */
>>>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>>>> int sec_per_write;
>>>>>> + int dma_meta_size;
>>>>>> + int dma_ppa_size;
>>>>>> + bool dma_shared;
>>>>>> +
>>>>>> unsigned char instance_uuid[16];
>>>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>>>> }
>>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>>>> + struct pblk_sec_meta *meta,
>>>>>> + int index)
>>>>>> +{
>>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>>> + struct nvm_geo *geo = &dev->geo;
>>>>>> +
>>>>>> + return ((void *)meta + index * geo->sos);
>>>>>> +}
>>>>>> +
>>>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>>>> {
>>>>>> return le32_to_cpu(*line->vsc);
>>>>>
>>>>> It will be helpful to split this patch into two:
>>>>>
>>>>> - One that does the 8b conversion
>>>>> - One that makes the change to merge metadata and ppa list data buffers
>>>> pblk has always shared the dma buffer for the ppa list and the metadata
>>>> buffer. This patch adds the possibility to not merge if the metadata
>>>> size does not fit. I can separate it in 2 patches, but it seems to me
>>>> like a natural extension when relaxing the 16byte metadata size.
>>>>> - How about making it a simplification to only allow up to 32b
>>>>> metadata buffers, then one doesn't have to think about it crossing
>>>>> multiple pages.
>>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>>>> to start with.
>>>
>>> So I think that if we want to do this properly, we should support
>>> 'all' the metadata sizes, including >64K - which is not the case after
>>> Javier changes. In the past there were NVMe drives available with 128
>>> bytes of metadata per 4K sector, so it could be similar in future too
>>> for OC drives. I have somewhere patch which should work for any size
>>> of metadata with slightly different approach. I can send it (just need
>>> a moment for rebase) and you will decide which approach you would like
>>> to take.
>>>
>>>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>>>> and the next 512B for PPAs.
>>>> Looks like a huge structure for just managing a couple of pointers,
>>>> don't you think?
>>>> Javier
>> Matias,
>> Can you pick up this patch as is for this window and then Igor can extend
>> it to an arbitrary size in the future? We have a use case for 64B OOB /
>> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
>> the metadata buffer, the code brakes under this configuration.
>> If you have any comments that can respect this requirement, I can apply
>> them and resubmit.
>
> Igor, you said you only needed a moment. It would be better to fix this up the right way, and still make it for this window.
>
If we can make 4.20 then it is fine by me.
Thanks,
Javier
On 09/11/2018 01:30 PM, Javier Gonzalez wrote:
>> On 11 Sep 2018, at 10.50, Javier Gonzalez <[email protected]> wrote:
>>
>>> On 7 Sep 2018, at 12.28, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>> In the OCSSD 2.0 spec., the sector and metadata sizes are reported though
>>>> the standard nvme identify command. Thus, the lightnvm subsystem needs
>>>> to update this information on the geometry structure on bootup.
>>>> Since 1.2 devices report these values on the OCSSD geometry identify,
>>>> avoid this update is it is unnecessary and can also corrupt the geometry
>>>> if the devices does not report the nvme sizes correctly (which is not
>>>> required by the OCSSD 1.2 spec either) >
>>>> Signed-off-by: Javier González <[email protected]>
>>>> ---
>>>> drivers/nvme/host/lightnvm.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
>>>> index 5bfa354c5dd5..33ed09f8410e 100644
>>>> --- a/drivers/nvme/host/lightnvm.c
>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>> @@ -980,6 +980,9 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
>>>> struct nvm_dev *ndev = ns->ndev;
>>>> struct nvm_geo *geo = &ndev->geo;
>>>> + if (geo->version == NVM_OCSSD_SPEC_12)
>>>> + return;
>>>
>>> The lba format sizes are not explicit in 2.0 either. For a conforming drive, it should implement the NVM Command Set and expose the LBA format correctly. Although, I do get your incentive, and if it is okay with you, I'll reword the commit message to this and apply it?:
>>>
>>> "1.2 devices exposes their data and metadata size through the separate
>>> identify command. Make sure that the NVMe LBA format does not override
>>> these values."
>>
>> Sounds good. I'll resend either way as I rebased on top of for-4.20/core
>> - you can change the commit as you think it suits.
>
> Since Igor will send a patch for the OOB, can you pick this up
> separately? No changes needed after rebase.
>
> Thanks,
> Javier
>
Applied for 4.20.
> On 18 Sep 2018, at 10.32, Javier Gonzalez <[email protected]> wrote:
>
>> On 18 Sep 2018, at 10.31, Matias Bjørling <[email protected]> wrote:
>>
>> On 09/18/2018 10:09 AM, Javier Gonzalez wrote:
>>>> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>>>>
>>>>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>>>>> through the out of band area to store the lba mapped to the given
>>>>>>> physical sector. This is used for recovery purposes. Given that the
>>>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>>>>> structure for this purpose.
>>>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>>>>> reported by the device to layout metadata appropriately for the vector
>>>>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>>>>> case in which the device does not expose an out of band area and that
>>>>>>> pblk creation will fail in this case.
>>>>>>> Signed-off-by: Javier González <[email protected]>
>>>>>>> ---
>>>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>>>> if (!is_vector)
>>>>>>> return 0;
>>>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>>>> + if (pblk->dma_shared) {
>>>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> + }
>>>>>>> +
>>>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>>>> + &rqd->dma_ppa_list);
>>>>>>> + if (!rqd->ppa_list) {
>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>> + rqd->dma_meta_list);
>>>>>>> + return -ENOMEM;
>>>>>>> + }
>>>>>>> return 0;
>>>>>>> }
>>>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>>>>> if (rqd->meta_list)
>>>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>> - rqd->dma_meta_list);
>>>>>>> + rqd->dma_meta_list);
>>>>>>> +
>>>>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>>>>> + rqd->dma_ppa_list);
>>>>>>> }
>>>>>>> /* Caller must guarantee that the request is a valid type */
>>>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>>>>> rqd.is_seq = 1;
>>>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>>>>> }
>>>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> if (!meta_list)
>>>>>>> return -ENOMEM;
>>>>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>> + if (pblk->dma_shared) {
>>>>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>> + } else {
>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>> + &dma_ppa_list);
>>>>>>> + if (!ppa_list) {
>>>>>>> + ret = -ENOMEM;
>>>>>>> + goto free_meta_list;
>>>>>>> + }
>>>>>>> + }
>>>>>>> next_rq:
>>>>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>>>>> if (IS_ERR(bio)) {
>>>>>>> ret = PTR_ERR(bio);
>>>>>>> - goto free_rqd_dma;
>>>>>>> + goto free_ppa_list;
>>>>>>> }
>>>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>>>>> bio_put(bio);
>>>>>>> ret = -EINTR;
>>>>>>> - goto free_rqd_dma;
>>>>>>> + goto free_ppa_list;
>>>>>>> }
>>>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>>>>> bio_put(bio);
>>>>>>> ret = -EINTR;
>>>>>>> - goto free_rqd_dma;
>>>>>>> + goto free_ppa_list;
>>>>>>> }
>>>>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> if (ret) {
>>>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>>>>> bio_put(bio);
>>>>>>> - goto free_rqd_dma;
>>>>>>> + goto free_ppa_list;
>>>>>>> }
>>>>>>> atomic_dec(&pblk->inflight_io);
>>>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>> if (left_ppas)
>>>>>>> goto next_rq;
>>>>>>> -free_rqd_dma:
>>>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>>>>> +free_ppa_list:
>>>>>>> + if (!pblk->dma_shared)
>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>> +free_meta_list:
>>>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>> return ret;
>>>>>>> }
>>>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>>>> index a99854439224..57972156c318 100644
>>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>> int ret, max_write_ppas;
>>>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>>>>> + return -EINTR;
>>>>>>> + }
>>>>>>> +
>>>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>>>>> +
>>>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>>>>> + pblk->dma_shared = false;
>>>>>>> + else
>>>>>>> + pblk->dma_shared = true;
>>>>>>> +
>>>>>>> atomic64_set(&pblk->user_wa, 0);
>>>>>>> atomic64_set(&pblk->pad_wa, 0);
>>>>>>> atomic64_set(&pblk->gc_wa, 0);
>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>>>> index dc0efb852475..55fca16d18e4 100644
>>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>> unsigned int valid_secs)
>>>>>>> {
>>>>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> struct pblk_emeta *emeta;
>>>>>>> struct pblk_w_ctx *w_ctx;
>>>>>>> __le64 *lba_list;
>>>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>> /* ppa to be sent to the device */
>>>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>> +
>>>>>>> /* Write context for target bio completion on write buffer. Note
>>>>>>> * that the write buffer is protected by the sync backpointer,
>>>>>>> * and a single writer thread have access to each specific entry
>>>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>> kref_get(&line->ref);
>>>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>>>>> w_ctx->ppa = ppa_list[i];
>>>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>>>>> if (lba_list[paddr] != addr_empty)
>>>>>>> line->nr_valid_lbas++;
>>>>>>> else
>>>>>>> atomic64_inc(&pblk->pad_wa);
>>>>>>> } else {
>>>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>>>>> }
>>>>>>> }
>>>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>>>>> unsigned int off)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>> unsigned int map_secs;
>>>>>>> int min = pblk->min_write_pgs;
>>>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> +
>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>> bio_put(rqd->bio);
>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>> struct pblk_line *e_line, *d_line;
>>>>>>> unsigned int map_secs;
>>>>>>> int min = pblk->min_write_pgs;
>>>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> +
>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>> bio_put(rqd->bio);
>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> struct bio *bio, sector_t blba,
>>>>>>> unsigned long *read_bitmap)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>>>>> int nr_secs = rqd->nr_ppas;
>>>>>>> bool advanced_bio = false;
>>>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> struct ppa_addr p = ppas[i];
>>>>>>> sector_t lba = blba + i;
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> retry:
>>>>>>> if (pblk_ppa_empty(p)) {
>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>> +
>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>> if (unlikely(!advanced_bio)) {
>>>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> goto retry;
>>>>>>> }
>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>> advanced_bio = true;
>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> sector_t blba)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>> int nr_lbas = rqd->nr_ppas;
>>>>>>> int i;
>>>>>>> for (i = 0; i < nr_lbas; i++) {
>>>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> + u64 lba;
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>> continue;
>>>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> u64 *lba_list, int nr_lbas)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>> int i, j;
>>>>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> u64 lba = lba_list[i];
>>>>>>> u64 meta_lba;
>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>> continue;
>>>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>>>>> if (lba != meta_lba) {
>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>> struct bio *new_bio = rqd->bio;
>>>>>>> struct bio *bio = pr_ctx->orig_bio;
>>>>>>> struct bio_vec src_bv, dst_bv;
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>> }
>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>> - lba_list_media[i] = meta_list[i].lba;
>>>>>>> - meta_list[i].lba = lba_list_mem[i];
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> + lba_list_media[i] = meta->lba;
>>>>>>> + meta->lba = lba_list_mem[i];
>>>>>>> }
>>>>>>> /* Fill the holes in the original bio */
>>>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>>>>> kref_put(&line->ref, pblk_line_put);
>>>>>>> - meta_list[hole].lba = lba_list_media[i];
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>>>>> + meta->lba = lba_list_media[i];
>>>>>>> src_bv = new_bio->bi_io_vec[i++];
>>>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> unsigned long *read_bitmap,
>>>>>>> int nr_holes)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> struct pblk_pr_ctx *pr_ctx;
>>>>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>>>>> __le64 *lba_list_mem;
>>>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> int i;
>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> if (!pr_ctx)
>>>>>>> goto fail_free_pages;
>>>>>>> - for (i = 0; i < nr_secs; i++)
>>>>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>>>>> + for (i = 0; i < nr_secs; i++) {
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> + lba_list_mem[i] = meta->lba;
>>>>>>> + }
>>>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>> sector_t lba, unsigned long *read_bitmap)
>>>>>>> {
>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> struct ppa_addr ppa;
>>>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>> retry:
>>>>>>> if (pblk_ppa_empty(ppa)) {
>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>> +
>>>>>>> return;
>>>>>>> }
>>>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>> }
>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>>>> index 8114013c37b8..1ce92562603d 100644
>>>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>> {
>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>> struct pblk_pad_rq *pad_rq;
>>>>>>> struct nvm_rq *rqd;
>>>>>>> struct bio *bio;
>>>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>> rqd->end_io = pblk_end_io_recov;
>>>>>>> rqd->private = pad_rq;
>>>>>>> - meta_list = rqd->meta_list;
>>>>>>> -
>>>>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>>>>> struct ppa_addr ppa;
>>>>>>> int pos;
>>>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>>>>> rqd->ppa_list[i] = dev_ppa;
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>>>>> }
>>>>>>> }
>>>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>> struct ppa_addr *ppa_list;
>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>>>>> struct nvm_rq *rqd;
>>>>>>> struct bio *bio;
>>>>>>> void *data;
>>>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>> }
>>>>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>>>>> + u64 lba;
>>>>>>> +
>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>> if (!meta_list)
>>>>>>> return -ENOMEM;
>>>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>> + if (pblk->dma_shared) {
>>>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>> + } else {
>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>> + &dma_ppa_list);
>>>>>>> + if (!ppa_list) {
>>>>>>> + ret = -ENOMEM;
>>>>>>> + goto free_meta_list;
>>>>>>> + }
>>>>>>> + }
>>>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>>>>> if (!data) {
>>>>>>> ret = -ENOMEM;
>>>>>>> - goto free_meta_list;
>>>>>>> + goto free_ppa_list;
>>>>>>> }
>>>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>> out:
>>>>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>>>>> kfree(data);
>>>>>>> +free_ppa_list:
>>>>>>> + if (!pblk->dma_shared)
>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>> free_meta_list:
>>>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>> -
>>>>>>> return ret;
>>>>>>> }
>>>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>>>>> --- a/drivers/lightnvm/pblk.h
>>>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>>>> @@ -86,7 +86,6 @@ enum {
>>>>>>> };
>>>>>>> struct pblk_sec_meta {
>>>>>>> - u64 reserved;
>>>>>>> __le64 lba;
>>>>>>> };
>>>>>>> @@ -103,9 +102,6 @@ enum {
>>>>>>> PBLK_RL_LOW = 4
>>>>>>> };
>>>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>>>>> -
>>>>>>> /* write buffer completion context */
>>>>>>> struct pblk_c_ctx {
>>>>>>> struct list_head list; /* Head for out-of-order completion */
>>>>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>>>>> int sec_per_write;
>>>>>>> + int dma_meta_size;
>>>>>>> + int dma_ppa_size;
>>>>>>> + bool dma_shared;
>>>>>>> +
>>>>>>> unsigned char instance_uuid[16];
>>>>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>>>>> }
>>>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>>>>> + struct pblk_sec_meta *meta,
>>>>>>> + int index)
>>>>>>> +{
>>>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>> + struct nvm_geo *geo = &dev->geo;
>>>>>>> +
>>>>>>> + return ((void *)meta + index * geo->sos);
>>>>>>> +}
>>>>>>> +
>>>>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>>>>> {
>>>>>>> return le32_to_cpu(*line->vsc);
>>>>>>
>>>>>> It will be helpful to split this patch into two:
>>>>>>
>>>>>> - One that does the 8b conversion
>>>>>> - One that makes the change to merge metadata and ppa list data buffers
>>>>> pblk has always shared the dma buffer for the ppa list and the metadata
>>>>> buffer. This patch adds the possibility to not merge if the metadata
>>>>> size does not fit. I can separate it in 2 patches, but it seems to me
>>>>> like a natural extension when relaxing the 16byte metadata size.
>>>>>> - How about making it a simplification to only allow up to 32b
>>>>>> metadata buffers, then one doesn't have to think about it crossing
>>>>>> multiple pages.
>>>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>>>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>>>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>>>>> to start with.
>>>>
>>>> So I think that if we want to do this properly, we should support
>>>> 'all' the metadata sizes, including >64K - which is not the case after
>>>> Javier changes. In the past there were NVMe drives available with 128
>>>> bytes of metadata per 4K sector, so it could be similar in future too
>>>> for OC drives. I have somewhere patch which should work for any size
>>>> of metadata with slightly different approach. I can send it (just need
>>>> a moment for rebase) and you will decide which approach you would like
>>>> to take.
>>>>
>>>>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>>>>> and the next 512B for PPAs.
>>>>> Looks like a huge structure for just managing a couple of pointers,
>>>>> don't you think?
>>>>> Javier
>>> Matias,
>>> Can you pick up this patch as is for this window and then Igor can extend
>>> it to an arbitrary size in the future? We have a use case for 64B OOB /
>>> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
>>> the metadata buffer, the code brakes under this configuration.
>>> If you have any comments that can respect this requirement, I can apply
>>> them and resubmit.
>>
>> Igor, you said you only needed a moment. It would be better to fix this up the right way, and still make it for this window.
>
> If we can make 4.20 then it is fine by me.
>
> Thanks,
> Javier
Since not much has happened and we are closing on 4.20 PRs, can we take
this patch for the 4.20 window and extend in the future if necessary?
Thanks,
Javier
On 10/05/2018 02:18 PM, Javier Gonzalez wrote:
>> On 18 Sep 2018, at 10.32, Javier Gonzalez <[email protected]> wrote:
>>
>>> On 18 Sep 2018, at 10.31, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 09/18/2018 10:09 AM, Javier Gonzalez wrote:
>>>>> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>>>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>>>>>
>>>>>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>>>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>>>>>> through the out of band area to store the lba mapped to the given
>>>>>>>> physical sector. This is used for recovery purposes. Given that the
>>>>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>>>>>> structure for this purpose.
>>>>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>>>>>> reported by the device to layout metadata appropriately for the vector
>>>>>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>>>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>>>>>> case in which the device does not expose an out of band area and that
>>>>>>>> pblk creation will fail in this case.
>>>>>>>> Signed-off-by: Javier González <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>>>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>>>>> if (!is_vector)
>>>>>>>> return 0;
>>>>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>>>>>> +
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>>>>> + &rqd->dma_ppa_list);
>>>>>>>> + if (!rqd->ppa_list) {
>>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>>> + rqd->dma_meta_list);
>>>>>>>> + return -ENOMEM;
>>>>>>>> + }
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>>>>>> if (rqd->meta_list)
>>>>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>>> - rqd->dma_meta_list);
>>>>>>>> + rqd->dma_meta_list);
>>>>>>>> +
>>>>>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>>>>>> + rqd->dma_ppa_list);
>>>>>>>> }
>>>>>>>> /* Caller must guarantee that the request is a valid type */
>>>>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> rqd.is_seq = 1;
>>>>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>>>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>>>>>> }
>>>>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> if (!meta_list)
>>>>>>>> return -ENOMEM;
>>>>>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>>> + } else {
>>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>>> + &dma_ppa_list);
>>>>>>>> + if (!ppa_list) {
>>>>>>>> + ret = -ENOMEM;
>>>>>>>> + goto free_meta_list;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> next_rq:
>>>>>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>>>>>> if (IS_ERR(bio)) {
>>>>>>>> ret = PTR_ERR(bio);
>>>>>>>> - goto free_rqd_dma;
>>>>>>>> + goto free_ppa_list;
>>>>>>>> }
>>>>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>>>>>> bio_put(bio);
>>>>>>>> ret = -EINTR;
>>>>>>>> - goto free_rqd_dma;
>>>>>>>> + goto free_ppa_list;
>>>>>>>> }
>>>>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>>>>>> bio_put(bio);
>>>>>>>> ret = -EINTR;
>>>>>>>> - goto free_rqd_dma;
>>>>>>>> + goto free_ppa_list;
>>>>>>>> }
>>>>>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> if (ret) {
>>>>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>>>>>> bio_put(bio);
>>>>>>>> - goto free_rqd_dma;
>>>>>>>> + goto free_ppa_list;
>>>>>>>> }
>>>>>>>> atomic_dec(&pblk->inflight_io);
>>>>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> if (left_ppas)
>>>>>>>> goto next_rq;
>>>>>>>> -free_rqd_dma:
>>>>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>>>>>> +free_ppa_list:
>>>>>>>> + if (!pblk->dma_shared)
>>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>>> +free_meta_list:
>>>>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>>>>> index a99854439224..57972156c318 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>> int ret, max_write_ppas;
>>>>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>>>>>> + return -EINTR;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>>>>>> +
>>>>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>>>>>> + pblk->dma_shared = false;
>>>>>>>> + else
>>>>>>>> + pblk->dma_shared = true;
>>>>>>>> +
>>>>>>>> atomic64_set(&pblk->user_wa, 0);
>>>>>>>> atomic64_set(&pblk->pad_wa, 0);
>>>>>>>> atomic64_set(&pblk->gc_wa, 0);
>>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>>>>> index dc0efb852475..55fca16d18e4 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>> unsigned int valid_secs)
>>>>>>>> {
>>>>>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> struct pblk_emeta *emeta;
>>>>>>>> struct pblk_w_ctx *w_ctx;
>>>>>>>> __le64 *lba_list;
>>>>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>> /* ppa to be sent to the device */
>>>>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>>> +
>>>>>>>> /* Write context for target bio completion on write buffer. Note
>>>>>>>> * that the write buffer is protected by the sync backpointer,
>>>>>>>> * and a single writer thread have access to each specific entry
>>>>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>> kref_get(&line->ref);
>>>>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>>>>>> w_ctx->ppa = ppa_list[i];
>>>>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>>>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>>>>>> if (lba_list[paddr] != addr_empty)
>>>>>>>> line->nr_valid_lbas++;
>>>>>>>> else
>>>>>>>> atomic64_inc(&pblk->pad_wa);
>>>>>>>> } else {
>>>>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>>>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>>>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>>>>>> unsigned int off)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>>> unsigned int map_secs;
>>>>>>>> int min = pblk->min_write_pgs;
>>>>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> +
>>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>>> bio_put(rqd->bio);
>>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>>> struct pblk_line *e_line, *d_line;
>>>>>>>> unsigned int map_secs;
>>>>>>>> int min = pblk->min_write_pgs;
>>>>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> +
>>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>>> bio_put(rqd->bio);
>>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> struct bio *bio, sector_t blba,
>>>>>>>> unsigned long *read_bitmap)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>>>>>> int nr_secs = rqd->nr_ppas;
>>>>>>>> bool advanced_bio = false;
>>>>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> struct ppa_addr p = ppas[i];
>>>>>>>> sector_t lba = blba + i;
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> retry:
>>>>>>>> if (pblk_ppa_empty(p)) {
>>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>> +
>>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>> if (unlikely(!advanced_bio)) {
>>>>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> goto retry;
>>>>>>>> }
>>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>>> advanced_bio = true;
>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> sector_t blba)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>>> int nr_lbas = rqd->nr_ppas;
>>>>>>>> int i;
>>>>>>>> for (i = 0; i < nr_lbas; i++) {
>>>>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> + u64 lba;
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>>> continue;
>>>>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> u64 *lba_list, int nr_lbas)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>>> int i, j;
>>>>>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> u64 lba = lba_list[i];
>>>>>>>> u64 meta_lba;
>>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>>> continue;
>>>>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>>>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>>>>>> if (lba != meta_lba) {
>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>> struct bio *new_bio = rqd->bio;
>>>>>>>> struct bio *bio = pr_ctx->orig_bio;
>>>>>>>> struct bio_vec src_bv, dst_bv;
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>>>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>>>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>> }
>>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>>> - lba_list_media[i] = meta_list[i].lba;
>>>>>>>> - meta_list[i].lba = lba_list_mem[i];
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> + lba_list_media[i] = meta->lba;
>>>>>>>> + meta->lba = lba_list_mem[i];
>>>>>>>> }
>>>>>>>> /* Fill the holes in the original bio */
>>>>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>>>>>> kref_put(&line->ref, pblk_line_put);
>>>>>>>> - meta_list[hole].lba = lba_list_media[i];
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>>>>>> + meta->lba = lba_list_media[i];
>>>>>>>> src_bv = new_bio->bi_io_vec[i++];
>>>>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> unsigned long *read_bitmap,
>>>>>>>> int nr_holes)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> struct pblk_pr_ctx *pr_ctx;
>>>>>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>>>>>> __le64 *lba_list_mem;
>>>>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> int i;
>>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> if (!pr_ctx)
>>>>>>>> goto fail_free_pages;
>>>>>>>> - for (i = 0; i < nr_secs; i++)
>>>>>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>>>>>> + for (i = 0; i < nr_secs; i++) {
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> + lba_list_mem[i] = meta->lba;
>>>>>>>> + }
>>>>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>> sector_t lba, unsigned long *read_bitmap)
>>>>>>>> {
>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> struct ppa_addr ppa;
>>>>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>> retry:
>>>>>>>> if (pblk_ppa_empty(ppa)) {
>>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>> +
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>> }
>>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>>>>> index 8114013c37b8..1ce92562603d 100644
>>>>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> {
>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>> struct pblk_pad_rq *pad_rq;
>>>>>>>> struct nvm_rq *rqd;
>>>>>>>> struct bio *bio;
>>>>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> rqd->end_io = pblk_end_io_recov;
>>>>>>>> rqd->private = pad_rq;
>>>>>>>> - meta_list = rqd->meta_list;
>>>>>>>> -
>>>>>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>>>>>> struct ppa_addr ppa;
>>>>>>>> int pos;
>>>>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>>>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>>>>>> rqd->ppa_list[i] = dev_ppa;
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>>>>>> }
>>>>>>>> }
>>>>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>> struct ppa_addr *ppa_list;
>>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>>>>>> struct nvm_rq *rqd;
>>>>>>>> struct bio *bio;
>>>>>>>> void *data;
>>>>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>>> }
>>>>>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>>>>>> + u64 lba;
>>>>>>>> +
>>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>>> if (!meta_list)
>>>>>>>> return -ENOMEM;
>>>>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>>> + } else {
>>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>>> + &dma_ppa_list);
>>>>>>>> + if (!ppa_list) {
>>>>>>>> + ret = -ENOMEM;
>>>>>>>> + goto free_meta_list;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>>>>>> if (!data) {
>>>>>>>> ret = -ENOMEM;
>>>>>>>> - goto free_meta_list;
>>>>>>>> + goto free_ppa_list;
>>>>>>>> }
>>>>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>>> out:
>>>>>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>>>>>> kfree(data);
>>>>>>>> +free_ppa_list:
>>>>>>>> + if (!pblk->dma_shared)
>>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>>> free_meta_list:
>>>>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>>> -
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>>>>>> --- a/drivers/lightnvm/pblk.h
>>>>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>>>>> @@ -86,7 +86,6 @@ enum {
>>>>>>>> };
>>>>>>>> struct pblk_sec_meta {
>>>>>>>> - u64 reserved;
>>>>>>>> __le64 lba;
>>>>>>>> };
>>>>>>>> @@ -103,9 +102,6 @@ enum {
>>>>>>>> PBLK_RL_LOW = 4
>>>>>>>> };
>>>>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>>>>>> -
>>>>>>>> /* write buffer completion context */
>>>>>>>> struct pblk_c_ctx {
>>>>>>>> struct list_head list; /* Head for out-of-order completion */
>>>>>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>>>>>> int sec_per_write;
>>>>>>>> + int dma_meta_size;
>>>>>>>> + int dma_ppa_size;
>>>>>>>> + bool dma_shared;
>>>>>>>> +
>>>>>>>> unsigned char instance_uuid[16];
>>>>>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>>>>>> }
>>>>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>>>>>> + struct pblk_sec_meta *meta,
>>>>>>>> + int index)
>>>>>>>> +{
>>>>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>> + struct nvm_geo *geo = &dev->geo;
>>>>>>>> +
>>>>>>>> + return ((void *)meta + index * geo->sos);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>>>>>> {
>>>>>>>> return le32_to_cpu(*line->vsc);
>>>>>>>
>>>>>>> It will be helpful to split this patch into two:
>>>>>>>
>>>>>>> - One that does the 8b conversion
>>>>>>> - One that makes the change to merge metadata and ppa list data buffers
>>>>>> pblk has always shared the dma buffer for the ppa list and the metadata
>>>>>> buffer. This patch adds the possibility to not merge if the metadata
>>>>>> size does not fit. I can separate it in 2 patches, but it seems to me
>>>>>> like a natural extension when relaxing the 16byte metadata size.
>>>>>>> - How about making it a simplification to only allow up to 32b
>>>>>>> metadata buffers, then one doesn't have to think about it crossing
>>>>>>> multiple pages.
>>>>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>>>>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>>>>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>>>>>> to start with.
>>>>>
>>>>> So I think that if we want to do this properly, we should support
>>>>> 'all' the metadata sizes, including >64K - which is not the case after
>>>>> Javier changes. In the past there were NVMe drives available with 128
>>>>> bytes of metadata per 4K sector, so it could be similar in future too
>>>>> for OC drives. I have somewhere patch which should work for any size
>>>>> of metadata with slightly different approach. I can send it (just need
>>>>> a moment for rebase) and you will decide which approach you would like
>>>>> to take.
>>>>>
>>>>>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>>>>>> and the next 512B for PPAs.
>>>>>> Looks like a huge structure for just managing a couple of pointers,
>>>>>> don't you think?
>>>>>> Javier
>>>> Matias,
>>>> Can you pick up this patch as is for this window and then Igor can extend
>>>> it to an arbitrary size in the future? We have a use case for 64B OOB /
>>>> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
>>>> the metadata buffer, the code brakes under this configuration.
>>>> If you have any comments that can respect this requirement, I can apply
>>>> them and resubmit.
>>>
>>> Igor, you said you only needed a moment. It would be better to fix this up the right way, and still make it for this window.
>>
>> If we can make 4.20 then it is fine by me.
>>
>> Thanks,
>> Javier
>
> Since not much has happened and we are closing on 4.20 PRs, can we take
> this patch for the 4.20 window and extend in the future if necessary?
>
> Thanks,
> Javier
>
Hans' comments on breaking the disk format is valid (and applies for
this patch as well). Let's push this out until Igor's patches are in and
the disk format updates are in place.
> On 9 Oct 2018, at 18.16, Matias Bjørling <[email protected]> wrote:
>
> On 10/05/2018 02:18 PM, Javier Gonzalez wrote:
>>> On 18 Sep 2018, at 10.32, Javier Gonzalez <[email protected]> wrote:
>>>
>>>> On 18 Sep 2018, at 10.31, Matias Bjørling <[email protected]> wrote:
>>>>
>>>> On 09/18/2018 10:09 AM, Javier Gonzalez wrote:
>>>>>> On 11 Sep 2018, at 12.22, Igor Konopko <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11.09.2018 11:14, Javier Gonzalez wrote:
>>>>>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 08/29/2018 12:09 PM, Javier González wrote:
>>>>>>>>> pblk uses 8 bytes in the metadata region area exposed by the device
>>>>>>>>> through the out of band area to store the lba mapped to the given
>>>>>>>>> physical sector. This is used for recovery purposes. Given that the
>>>>>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded
>>>>>>>>> structure for this purpose.
>>>>>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size
>>>>>>>>> reported by the device to layout metadata appropriately for the vector
>>>>>>>>> commands. This adds support for arbitrary metadata sizes, as long as
>>>>>>>>> these are larger than 8 bytes. Note that this patch does not address the
>>>>>>>>> case in which the device does not expose an out of band area and that
>>>>>>>>> pblk creation will fail in this case.
>>>>>>>>> Signed-off-by: Javier González <[email protected]>
>>>>>>>>> ---
>>>>>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++----------
>>>>>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++
>>>>>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++-----
>>>>>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++--------------
>>>>>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++-------
>>>>>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++---
>>>>>>>>> 6 files changed, 143 insertions(+), 53 deletions(-)
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>>>>>> index a311cc29afd8..d52e0047ae9d 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>>>>>> if (!is_vector)
>>>>>>>>> return 0;
>>>>>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size;
>>>>>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size;
>>>>>>>>> +
>>>>>>>>> + return 0;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>>>>>> + &rqd->dma_ppa_list);
>>>>>>>>> + if (!rqd->ppa_list) {
>>>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>>>> + rqd->dma_meta_list);
>>>>>>>>> + return -ENOMEM;
>>>>>>>>> + }
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>>>>>>> if (rqd->meta_list)
>>>>>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>>>>>>> - rqd->dma_meta_list);
>>>>>>>>> + rqd->dma_meta_list);
>>>>>>>>> +
>>>>>>>>> + if (!pblk->dma_shared && rqd->ppa_list)
>>>>>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list,
>>>>>>>>> + rqd->dma_ppa_list);
>>>>>>>>> }
>>>>>>>>> /* Caller must guarantee that the request is a valid type */
>>>>>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> rqd.is_seq = 1;
>>>>>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) {
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty;
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i);
>>>>>>>>> + meta->lba = lba_list[paddr] = addr_empty;
>>>>>>>>> }
>>>>>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd);
>>>>>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> if (!meta_list)
>>>>>>>>> return -ENOMEM;
>>>>>>>>> - ppa_list = meta_list + pblk_dma_meta_size;
>>>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>>> + ppa_list = meta_list + pblk->dma_meta_size;
>>>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>>>> + } else {
>>>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>>>> + &dma_ppa_list);
>>>>>>>>> + if (!ppa_list) {
>>>>>>>>> + ret = -ENOMEM;
>>>>>>>>> + goto free_meta_list;
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> next_rq:
>>>>>>>>> memset(&rqd, 0, sizeof(struct nvm_rq));
>>>>>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> l_mg->emeta_alloc_type, GFP_KERNEL);
>>>>>>>>> if (IS_ERR(bio)) {
>>>>>>>>> ret = PTR_ERR(bio);
>>>>>>>>> - goto free_rqd_dma;
>>>>>>>>> + goto free_ppa_list;
>>>>>>>>> }
>>>>>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) {
>>>>>>>>> bio_put(bio);
>>>>>>>>> ret = -EINTR;
>>>>>>>>> - goto free_rqd_dma;
>>>>>>>>> + goto free_ppa_list;
>>>>>>>>> }
>>>>>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id);
>>>>>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
>>>>>>>>> bio_put(bio);
>>>>>>>>> ret = -EINTR;
>>>>>>>>> - goto free_rqd_dma;
>>>>>>>>> + goto free_ppa_list;
>>>>>>>>> }
>>>>>>>>> for (j = 0; j < min; j++, i++, paddr++)
>>>>>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> if (ret) {
>>>>>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret);
>>>>>>>>> bio_put(bio);
>>>>>>>>> - goto free_rqd_dma;
>>>>>>>>> + goto free_ppa_list;
>>>>>>>>> }
>>>>>>>>> atomic_dec(&pblk->inflight_io);
>>>>>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> if (left_ppas)
>>>>>>>>> goto next_rq;
>>>>>>>>> -free_rqd_dma:
>>>>>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
>>>>>>>>> +free_ppa_list:
>>>>>>>>> + if (!pblk->dma_shared)
>>>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>>>> +free_meta_list:
>>>>>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>>>> return ret;
>>>>>>>>> }
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>>>>>>> index a99854439224..57972156c318 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk)
>>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>>> int ret, max_write_ppas;
>>>>>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) {
>>>>>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n",
>>>>>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos);
>>>>>>>>> + return -EINTR;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA);
>>>>>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA;
>>>>>>>>> +
>>>>>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE)
>>>>>>>>> + pblk->dma_shared = false;
>>>>>>>>> + else
>>>>>>>>> + pblk->dma_shared = true;
>>>>>>>>> +
>>>>>>>>> atomic64_set(&pblk->user_wa, 0);
>>>>>>>>> atomic64_set(&pblk->pad_wa, 0);
>>>>>>>>> atomic64_set(&pblk->gc_wa, 0);
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
>>>>>>>>> index dc0efb852475..55fca16d18e4 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-map.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-map.c
>>>>>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>>> unsigned int valid_secs)
>>>>>>>>> {
>>>>>>>>> struct pblk_line *line = pblk_line_get_data(pblk);
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> struct pblk_emeta *emeta;
>>>>>>>>> struct pblk_w_ctx *w_ctx;
>>>>>>>>> __le64 *lba_list;
>>>>>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>>> /* ppa to be sent to the device */
>>>>>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
>>>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>>>> +
>>>>>>>>> /* Write context for target bio completion on write buffer. Note
>>>>>>>>> * that the write buffer is protected by the sync backpointer,
>>>>>>>>> * and a single writer thread have access to each specific entry
>>>>>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
>>>>>>>>> kref_get(&line->ref);
>>>>>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
>>>>>>>>> w_ctx->ppa = ppa_list[i];
>>>>>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba);
>>>>>>>>> + meta->lba = cpu_to_le64(w_ctx->lba);
>>>>>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba);
>>>>>>>>> if (lba_list[paddr] != addr_empty)
>>>>>>>>> line->nr_valid_lbas++;
>>>>>>>>> else
>>>>>>>>> atomic64_inc(&pblk->pad_wa);
>>>>>>>>> } else {
>>>>>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty;
>>>>>>>>> + lba_list[paddr] = meta->lba = addr_empty;
>>>>>>>>> __pblk_map_invalidate(pblk, line, paddr);
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>>>> unsigned long *lun_bitmap, unsigned int valid_secs,
>>>>>>>>> unsigned int off)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>>>> unsigned int map_secs;
>>>>>>>>> int min = pblk->min_write_pgs;
>>>>>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
>>>>>>>>> for (i = off; i < rqd->nr_ppas; i += min) {
>>>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> +
>>>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>>>> bio_put(rqd->bio);
>>>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>>> struct pblk_line_meta *lm = &pblk->lm;
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
>>>>>>>>> + struct pblk_sec_meta *meta_list;
>>>>>>>>> struct pblk_line *e_line, *d_line;
>>>>>>>>> unsigned int map_secs;
>>>>>>>>> int min = pblk->min_write_pgs;
>>>>>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> for (i = 0; i < rqd->nr_ppas; i += min) {
>>>>>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
>>>>>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> +
>>>>>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
>>>>>>>>> - lun_bitmap, &meta_list[i], map_secs)) {
>>>>>>>>> + lun_bitmap, meta_list, map_secs)) {
>>>>>>>>> bio_put(rqd->bio);
>>>>>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE);
>>>>>>>>> pblk_pipeline_stop(pblk);
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
>>>>>>>>> index 57d3155ef9a5..12b690e2abd9 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-read.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-read.c
>>>>>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> struct bio *bio, sector_t blba,
>>>>>>>>> unsigned long *read_bitmap)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> struct ppa_addr ppas[NVM_MAX_VLBA];
>>>>>>>>> int nr_secs = rqd->nr_ppas;
>>>>>>>>> bool advanced_bio = false;
>>>>>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
>>>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> struct ppa_addr p = ppas[i];
>>>>>>>>> sector_t lba = blba + i;
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> retry:
>>>>>>>>> if (pblk_ppa_empty(p)) {
>>>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>>> +
>>>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>>> if (unlikely(!advanced_bio)) {
>>>>>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
>>>>>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> goto retry;
>>>>>>>>> }
>>>>>>>>> WARN_ON(test_and_set_bit(i, read_bitmap));
>>>>>>>>> - meta_list[i].lba = cpu_to_le64(lba);
>>>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>>>> advanced_bio = true;
>>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> sector_t blba)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>>>> int nr_lbas = rqd->nr_ppas;
>>>>>>>>> int i;
>>>>>>>>> for (i = 0; i < nr_lbas; i++) {
>>>>>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba);
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> + u64 lba;
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>>>> continue;
>>>>>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> u64 *lba_list, int nr_lbas)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
>>>>>>>>> int i, j;
>>>>>>>>> for (i = 0, j = 0; i < nr_lbas; i++) {
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> u64 lba = lba_list[i];
>>>>>>>>> u64 meta_lba;
>>>>>>>>> if (lba == ADDR_EMPTY)
>>>>>>>>> continue;
>>>>>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba);
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j);
>>>>>>>>> + meta_lba = le64_to_cpu(meta->lba);
>>>>>>>>> if (lba != meta_lba) {
>>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>> struct bio *new_bio = rqd->bio;
>>>>>>>>> struct bio *bio = pr_ctx->orig_bio;
>>>>>>>>> struct bio_vec src_bv, dst_bv;
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> int bio_init_idx = pr_ctx->bio_init_idx;
>>>>>>>>> unsigned long *read_bitmap = pr_ctx->bitmap;
>>>>>>>>> int nr_secs = pr_ctx->orig_nr_secs;
>>>>>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>> }
>>>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
>>>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size);
>>>>>>>>> for (i = 0; i < nr_secs; i++) {
>>>>>>>>> - lba_list_media[i] = meta_list[i].lba;
>>>>>>>>> - meta_list[i].lba = lba_list_mem[i];
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> + lba_list_media[i] = meta->lba;
>>>>>>>>> + meta->lba = lba_list_mem[i];
>>>>>>>>> }
>>>>>>>>> /* Fill the holes in the original bio */
>>>>>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
>>>>>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
>>>>>>>>> kref_put(&line->ref, pblk_line_put);
>>>>>>>>> - meta_list[hole].lba = lba_list_media[i];
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole);
>>>>>>>>> + meta->lba = lba_list_media[i];
>>>>>>>>> src_bv = new_bio->bi_io_vec[i++];
>>>>>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole];
>>>>>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> unsigned long *read_bitmap,
>>>>>>>>> int nr_holes)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> struct pblk_pr_ctx *pr_ctx;
>>>>>>>>> struct bio *new_bio, *bio = r_ctx->private;
>>>>>>>>> __le64 *lba_list_mem;
>>>>>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> int i;
>>>>>>>>> /* Re-use allocated memory for intermediate lbas */
>>>>>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
>>>>>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size);
>>>>>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes);
>>>>>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> if (!pr_ctx)
>>>>>>>>> goto fail_free_pages;
>>>>>>>>> - for (i = 0; i < nr_secs; i++)
>>>>>>>>> - lba_list_mem[i] = meta_list[i].lba;
>>>>>>>>> + for (i = 0; i < nr_secs; i++) {
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> + lba_list_mem[i] = meta->lba;
>>>>>>>>> + }
>>>>>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */
>>>>>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
>>>>>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
>>>>>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>>> sector_t lba, unsigned long *read_bitmap)
>>>>>>>>> {
>>>>>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> struct ppa_addr ppa;
>>>>>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
>>>>>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>>> retry:
>>>>>>>>> if (pblk_ppa_empty(ppa)) {
>>>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY);
>>>>>>>>> +
>>>>>>>>> return;
>>>>>>>>> }
>>>>>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
>>>>>>>>> }
>>>>>>>>> WARN_ON(test_and_set_bit(0, read_bitmap));
>>>>>>>>> - meta_list[0].lba = cpu_to_le64(lba);
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0);
>>>>>>>>> + meta->lba = cpu_to_le64(lba);
>>>>>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG
>>>>>>>>> atomic_long_inc(&pblk->cache_reads);
>>>>>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>>>>>> index 8114013c37b8..1ce92562603d 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> {
>>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta;
>>>>>>>>> struct pblk_pad_rq *pad_rq;
>>>>>>>>> struct nvm_rq *rqd;
>>>>>>>>> struct bio *bio;
>>>>>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> rqd->end_io = pblk_end_io_recov;
>>>>>>>>> rqd->private = pad_rq;
>>>>>>>>> - meta_list = rqd->meta_list;
>>>>>>>>> -
>>>>>>>>> for (i = 0; i < rqd->nr_ppas; ) {
>>>>>>>>> struct ppa_addr ppa;
>>>>>>>>> int pos;
>>>>>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id);
>>>>>>>>> pblk_map_invalidate(pblk, dev_ppa);
>>>>>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty;
>>>>>>>>> rqd->ppa_list[i] = dev_ppa;
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i);
>>>>>>>>> + lba_list[w_ptr] = meta->lba = addr_empty;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>>> struct nvm_geo *geo = &dev->geo;
>>>>>>>>> struct ppa_addr *ppa_list;
>>>>>>>>> - struct pblk_sec_meta *meta_list;
>>>>>>>>> + struct pblk_sec_meta *meta_list, *meta;
>>>>>>>>> struct nvm_rq *rqd;
>>>>>>>>> struct bio *bio;
>>>>>>>>> void *data;
>>>>>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>>>>> }
>>>>>>>>> for (i = 0; i < rqd->nr_ppas; i++) {
>>>>>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba);
>>>>>>>>> + u64 lba;
>>>>>>>>> +
>>>>>>>>> + meta = sec_meta_index(pblk, meta_list, i);
>>>>>>>>> + lba = le64_to_cpu(meta->lba);
>>>>>>>>> lba_list[paddr++] = cpu_to_le64(lba);
>>>>>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>>>> if (!meta_list)
>>>>>>>>> return -ENOMEM;
>>>>>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
>>>>>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
>>>>>>>>> + if (pblk->dma_shared) {
>>>>>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size;
>>>>>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size;
>>>>>>>>> + } else {
>>>>>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>>>>>>>>> + &dma_ppa_list);
>>>>>>>>> + if (!ppa_list) {
>>>>>>>>> + ret = -ENOMEM;
>>>>>>>>> + goto free_meta_list;
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
>>>>>>>>> if (!data) {
>>>>>>>>> ret = -ENOMEM;
>>>>>>>>> - goto free_meta_list;
>>>>>>>>> + goto free_ppa_list;
>>>>>>>>> }
>>>>>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL);
>>>>>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
>>>>>>>>> out:
>>>>>>>>> mempool_free(rqd, &pblk->r_rq_pool);
>>>>>>>>> kfree(data);
>>>>>>>>> +free_ppa_list:
>>>>>>>>> + if (!pblk->dma_shared)
>>>>>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list);
>>>>>>>>> free_meta_list:
>>>>>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
>>>>>>>>> -
>>>>>>>>> return ret;
>>>>>>>>> }
>>>>>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>>>>>>>> index 22cc9bfbbb10..4526fee206d9 100644
>>>>>>>>> --- a/drivers/lightnvm/pblk.h
>>>>>>>>> +++ b/drivers/lightnvm/pblk.h
>>>>>>>>> @@ -86,7 +86,6 @@ enum {
>>>>>>>>> };
>>>>>>>>> struct pblk_sec_meta {
>>>>>>>>> - u64 reserved;
>>>>>>>>> __le64 lba;
>>>>>>>>> };
>>>>>>>>> @@ -103,9 +102,6 @@ enum {
>>>>>>>>> PBLK_RL_LOW = 4
>>>>>>>>> };
>>>>>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
>>>>>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
>>>>>>>>> -
>>>>>>>>> /* write buffer completion context */
>>>>>>>>> struct pblk_c_ctx {
>>>>>>>>> struct list_head list; /* Head for out-of-order completion */
>>>>>>>>> @@ -637,6 +633,10 @@ struct pblk {
>>>>>>>>> int sec_per_write;
>>>>>>>>> + int dma_meta_size;
>>>>>>>>> + int dma_ppa_size;
>>>>>>>>> + bool dma_shared;
>>>>>>>>> +
>>>>>>>>> unsigned char instance_uuid[16];
>>>>>>>>> /* Persistent write amplification counters, 4kb sector I/Os */
>>>>>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
>>>>>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]);
>>>>>>>>> }
>>>>>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk,
>>>>>>>>> + struct pblk_sec_meta *meta,
>>>>>>>>> + int index)
>>>>>>>>> +{
>>>>>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>>>>>> + struct nvm_geo *geo = &dev->geo;
>>>>>>>>> +
>>>>>>>>> + return ((void *)meta + index * geo->sos);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> static inline int pblk_line_vsc(struct pblk_line *line)
>>>>>>>>> {
>>>>>>>>> return le32_to_cpu(*line->vsc);
>>>>>>>>
>>>>>>>> It will be helpful to split this patch into two:
>>>>>>>>
>>>>>>>> - One that does the 8b conversion
>>>>>>>> - One that makes the change to merge metadata and ppa list data buffers
>>>>>>> pblk has always shared the dma buffer for the ppa list and the metadata
>>>>>>> buffer. This patch adds the possibility to not merge if the metadata
>>>>>>> size does not fit. I can separate it in 2 patches, but it seems to me
>>>>>>> like a natural extension when relaxing the 16byte metadata size.
>>>>>>>> - How about making it a simplification to only allow up to 32b
>>>>>>>> metadata buffers, then one doesn't have to think about it crossing
>>>>>>>> multiple pages.
>>>>>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left
>>>>>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok
>>>>>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case
>>>>>>> to start with.
>>>>>>
>>>>>> So I think that if we want to do this properly, we should support
>>>>>> 'all' the metadata sizes, including >64K - which is not the case after
>>>>>> Javier changes. In the past there were NVMe drives available with 128
>>>>>> bytes of metadata per 4K sector, so it could be similar in future too
>>>>>> for OC drives. I have somewhere patch which should work for any size
>>>>>> of metadata with slightly different approach. I can send it (just need
>>>>>> a moment for rebase) and you will decide which approach you would like
>>>>>> to take.
>>>>>>
>>>>>>>> - You can also make a structure for it, use the first 3.5KB for meta,
>>>>>>>> and the next 512B for PPAs.
>>>>>>> Looks like a huge structure for just managing a couple of pointers,
>>>>>>> don't you think?
>>>>>>> Javier
>>>>> Matias,
>>>>> Can you pick up this patch as is for this window and then Igor can extend
>>>>> it to an arbitrary size in the future? We have a use case for 64B OOB /
>>>>> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes
>>>>> the metadata buffer, the code brakes under this configuration.
>>>>> If you have any comments that can respect this requirement, I can apply
>>>>> them and resubmit.
>>>>
>>>> Igor, you said you only needed a moment. It would be better to fix this up the right way, and still make it for this window.
>>>
>>> If we can make 4.20 then it is fine by me.
>>>
>>> Thanks,
>>> Javier
>> Since not much has happened and we are closing on 4.20 PRs, can we take
>> this patch for the 4.20 window and extend in the future if necessary?
>> Thanks,
>> Javier
>
> Hans' comments on breaking the disk format is valid (and applies for
> this patch as well). Let's push this out until Igor's patches are in
> and the disk format updates are in place.
Sounds good. Let's target next release instead and get Igor's patches in
place instead.
Thanks,
Javier