2018-08-29 08:58:25

by Javier González

[permalink] [raw]
Subject: [V4 PATCH 0/2] lightnvm: pblk: take write semaphore on metadata

# Changes since V3:
- Encapsulate rqd dma allocations to reduce code replication (by Matias)

# Changes since V2:
- Split the original patch between the metadata refactoring and the
semaphore logic. This simplifies the write path, where the semaphore
is taken.

# Changes singe V1:
- Fix double I/O on the read path (by Matias)
- Improve commit message (by Jens)

This patchset refactors the metadata separate write and read paths,
which simplifies how the semaphore is taken for writes.

Thanks,
Javier

Javier González (3):
lightnvm: pblk: refactor metadata paths
lightnvm: encapsule rqd dma allocations
lightnvm: pblk: take write semaphore on metadata

drivers/lightnvm/pblk-core.c | 377 ++++++++++++++++++++-------------------
drivers/lightnvm/pblk-gc.c | 2 +-
drivers/lightnvm/pblk-read.c | 35 ++--
drivers/lightnvm/pblk-recovery.c | 33 ++--
drivers/lightnvm/pblk-write.c | 15 +-
drivers/lightnvm/pblk.h | 8 +-
6 files changed, 230 insertions(+), 240 deletions(-)

--
2.7.4



2018-08-29 08:57:57

by Javier González

[permalink] [raw]
Subject: [PATCH 1/3] lightnvm: pblk: refactor metadata paths

pblk maintains two different metadata paths for smeta and emeta, which
store metadata at the start of the line and at the end of the line,
respectively. Until now, these path has been common for writing and
retrieving metadata, however, as these paths diverge, the common code
becomes less clear and unnecessary complicated.

In preparation for further changes to the metadata write path, this
patch separates the write and read paths for smeta and emeta and
removes the synchronous emeta path as it not used anymore (emeta is
scheduled asynchronously to prevent jittering due to internal I/Os).

Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 338 ++++++++++++++++++---------------------
drivers/lightnvm/pblk-gc.c | 2 +-
drivers/lightnvm/pblk-recovery.c | 4 +-
drivers/lightnvm/pblk.h | 4 +-
4 files changed, 163 insertions(+), 185 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index dbf037b2b32f..09160ec02c5f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -661,12 +661,137 @@ u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line)
return paddr;
}

-/*
- * Submit emeta to one LUN in the raid line at the time to avoid a deadlock when
- * taking the per LUN semaphore.
- */
-static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
- void *emeta_buf, u64 paddr, int dir)
+u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct pblk_line_meta *lm = &pblk->lm;
+ int bit;
+
+ /* This usually only happens on bad lines */
+ bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
+ if (bit >= lm->blk_per_line)
+ return -1;
+
+ return bit * geo->ws_opt;
+}
+
+int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+ struct pblk_line_meta *lm = &pblk->lm;
+ struct bio *bio;
+ struct nvm_rq rqd;
+ u64 paddr = pblk_line_smeta_start(pblk, line);
+ int i, ret;
+
+ memset(&rqd, 0, sizeof(struct nvm_rq));
+
+ rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &rqd.dma_meta_list);
+ if (!rqd.meta_list)
+ return -ENOMEM;
+
+ rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
+ rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+
+ bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
+ if (IS_ERR(bio)) {
+ ret = PTR_ERR(bio);
+ goto free_ppa_list;
+ }
+
+ bio->bi_iter.bi_sector = 0; /* internal bio */
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+
+ rqd.bio = bio;
+ rqd.opcode = NVM_OP_PREAD;
+ rqd.nr_ppas = lm->smeta_sec;
+ rqd.is_seq = 1;
+
+ for (i = 0; i < lm->smeta_sec; i++, paddr++)
+ rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+
+ ret = pblk_submit_io_sync(pblk, &rqd);
+ if (ret) {
+ pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
+ bio_put(bio);
+ goto free_ppa_list;
+ }
+
+ atomic_dec(&pblk->inflight_io);
+
+ if (rqd.error)
+ pblk_log_read_err(pblk, &rqd);
+
+free_ppa_list:
+ nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+ return ret;
+}
+
+static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
+ u64 paddr)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+ struct pblk_line_meta *lm = &pblk->lm;
+ struct bio *bio;
+ struct nvm_rq rqd;
+ __le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
+ __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
+ int i, ret;
+
+ memset(&rqd, 0, sizeof(struct nvm_rq));
+
+ rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &rqd.dma_meta_list);
+ if (!rqd.meta_list)
+ return -ENOMEM;
+
+ rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
+ rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+
+ bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
+ if (IS_ERR(bio)) {
+ ret = PTR_ERR(bio);
+ goto free_ppa_list;
+ }
+
+ bio->bi_iter.bi_sector = 0; /* internal bio */
+ bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+ rqd.bio = bio;
+ rqd.opcode = NVM_OP_PWRITE;
+ rqd.nr_ppas = lm->smeta_sec;
+ rqd.is_seq = 1;
+
+ for (i = 0; i < lm->smeta_sec; i++, paddr++) {
+ struct pblk_sec_meta *meta_list = rqd.meta_list;
+
+ rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
+ meta_list[i].lba = lba_list[paddr] = addr_empty;
+ }
+
+ ret = pblk_submit_io_sync(pblk, &rqd);
+ if (ret) {
+ pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
+ bio_put(bio);
+ goto free_ppa_list;
+ }
+
+ atomic_dec(&pblk->inflight_io);
+
+ if (rqd.error) {
+ pblk_log_write_err(pblk, &rqd);
+ ret = -EIO;
+ }
+
+free_ppa_list:
+ nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+ return ret;
+}
+
+int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
+ void *emeta_buf)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
@@ -675,24 +800,15 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
void *ppa_list, *meta_list;
struct bio *bio;
struct nvm_rq rqd;
+ u64 paddr = line->emeta_ssec;
dma_addr_t dma_ppa_list, dma_meta_list;
int min = pblk->min_write_pgs;
int left_ppas = lm->emeta_sec[0];
- int id = line->id;
+ int line_id = line->id;
int rq_ppas, rq_len;
- int cmd_op, bio_op;
int i, j;
int ret;

- if (dir == PBLK_WRITE) {
- bio_op = REQ_OP_WRITE;
- cmd_op = NVM_OP_PWRITE;
- } else if (dir == PBLK_READ) {
- bio_op = REQ_OP_READ;
- cmd_op = NVM_OP_PREAD;
- } else
- return -EINVAL;
-
meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
&dma_meta_list);
if (!meta_list)
@@ -715,64 +831,43 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
}

bio->bi_iter.bi_sector = 0; /* internal bio */
- bio_set_op_attrs(bio, bio_op, 0);
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);

rqd.bio = bio;
rqd.meta_list = meta_list;
rqd.ppa_list = ppa_list;
rqd.dma_meta_list = dma_meta_list;
rqd.dma_ppa_list = dma_ppa_list;
- rqd.opcode = cmd_op;
+ rqd.opcode = NVM_OP_PREAD;
rqd.nr_ppas = rq_ppas;

- if (dir == PBLK_WRITE) {
- struct pblk_sec_meta *meta_list = rqd.meta_list;
+ for (i = 0; i < rqd.nr_ppas; ) {
+ struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
+ int pos = pblk_ppa_to_pos(geo, ppa);

- rqd.is_seq = 1;
- for (i = 0; i < rqd.nr_ppas; ) {
- spin_lock(&line->lock);
- paddr = __pblk_alloc_page(pblk, line, min);
- spin_unlock(&line->lock);
- for (j = 0; j < min; j++, i++, paddr++) {
- meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
- rqd.ppa_list[i] =
- addr_to_gen_ppa(pblk, paddr, id);
- }
- }
- } else {
- for (i = 0; i < rqd.nr_ppas; ) {
- struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, id);
- int pos = pblk_ppa_to_pos(geo, ppa);
+ if (pblk_io_aligned(pblk, rq_ppas))
+ rqd.is_seq = 1;

- if (pblk_io_aligned(pblk, rq_ppas))
- rqd.is_seq = 1;
-
- while (test_bit(pos, line->blk_bitmap)) {
- paddr += min;
- if (pblk_boundary_paddr_checks(pblk, paddr)) {
- pblk_err(pblk, "corrupt emeta line:%d\n",
- line->id);
- bio_put(bio);
- ret = -EINTR;
- goto free_rqd_dma;
- }
-
- ppa = addr_to_gen_ppa(pblk, paddr, id);
- pos = pblk_ppa_to_pos(geo, ppa);
- }
-
- if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
- pblk_err(pblk, "corrupt emeta line:%d\n",
- line->id);
+ while (test_bit(pos, line->blk_bitmap)) {
+ paddr += min;
+ if (pblk_boundary_paddr_checks(pblk, paddr)) {
bio_put(bio);
ret = -EINTR;
goto free_rqd_dma;
}

- for (j = 0; j < min; j++, i++, paddr++)
- rqd.ppa_list[i] =
- addr_to_gen_ppa(pblk, paddr, line->id);
+ ppa = addr_to_gen_ppa(pblk, paddr, line_id);
+ pos = pblk_ppa_to_pos(geo, ppa);
}
+
+ if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
+ bio_put(bio);
+ ret = -EINTR;
+ goto free_rqd_dma;
+ }
+
+ for (j = 0; j < min; j++, i++, paddr++)
+ rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
}

ret = pblk_submit_io_sync(pblk, &rqd);
@@ -784,136 +879,19 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,

atomic_dec(&pblk->inflight_io);

- if (rqd.error) {
- if (dir == PBLK_WRITE)
- pblk_log_write_err(pblk, &rqd);
- else
- pblk_log_read_err(pblk, &rqd);
- }
+ if (rqd.error)
+ pblk_log_read_err(pblk, &rqd);

emeta_buf += rq_len;
left_ppas -= rq_ppas;
if (left_ppas)
goto next_rq;
+
free_rqd_dma:
nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
return ret;
}

-u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
-{
- struct nvm_tgt_dev *dev = pblk->dev;
- struct nvm_geo *geo = &dev->geo;
- struct pblk_line_meta *lm = &pblk->lm;
- int bit;
-
- /* This usually only happens on bad lines */
- bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
- if (bit >= lm->blk_per_line)
- return -1;
-
- return bit * geo->ws_opt;
-}
-
-static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
- u64 paddr, int dir)
-{
- struct nvm_tgt_dev *dev = pblk->dev;
- struct pblk_line_meta *lm = &pblk->lm;
- struct bio *bio;
- struct nvm_rq rqd;
- __le64 *lba_list = NULL;
- int i, ret;
- int cmd_op, bio_op;
-
- if (dir == PBLK_WRITE) {
- bio_op = REQ_OP_WRITE;
- cmd_op = NVM_OP_PWRITE;
- lba_list = emeta_to_lbas(pblk, line->emeta->buf);
- } else if (dir == PBLK_READ_RECOV || dir == PBLK_READ) {
- bio_op = REQ_OP_READ;
- cmd_op = NVM_OP_PREAD;
- } else
- return -EINVAL;
-
- memset(&rqd, 0, sizeof(struct nvm_rq));
-
- rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd.dma_meta_list);
- if (!rqd.meta_list)
- return -ENOMEM;
-
- rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
- rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
-
- bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
- if (IS_ERR(bio)) {
- ret = PTR_ERR(bio);
- goto free_ppa_list;
- }
-
- bio->bi_iter.bi_sector = 0; /* internal bio */
- bio_set_op_attrs(bio, bio_op, 0);
-
- rqd.bio = bio;
- rqd.opcode = cmd_op;
- rqd.is_seq = 1;
- rqd.nr_ppas = lm->smeta_sec;
-
- for (i = 0; i < lm->smeta_sec; i++, paddr++) {
- struct pblk_sec_meta *meta_list = rqd.meta_list;
-
- rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-
- if (dir == PBLK_WRITE) {
- __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
-
- meta_list[i].lba = lba_list[paddr] = addr_empty;
- }
- }
-
- /*
- * This I/O is sent by the write thread when a line is replace. Since
- * the write thread is the only one sending write and erase commands,
- * there is no need to take the LUN semaphore.
- */
- ret = pblk_submit_io_sync(pblk, &rqd);
- if (ret) {
- pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
- bio_put(bio);
- goto free_ppa_list;
- }
-
- atomic_dec(&pblk->inflight_io);
-
- if (rqd.error) {
- if (dir == PBLK_WRITE) {
- pblk_log_write_err(pblk, &rqd);
- ret = 1;
- } else if (dir == PBLK_READ)
- pblk_log_read_err(pblk, &rqd);
- }
-
-free_ppa_list:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
-
- return ret;
-}
-
-int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
-{
- u64 bpaddr = pblk_line_smeta_start(pblk, line);
-
- return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV);
-}
-
-int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
- void *emeta_buf)
-{
- return pblk_line_submit_emeta_io(pblk, line, emeta_buf,
- line->emeta_ssec, PBLK_READ);
-}
-
static void pblk_setup_e_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct ppa_addr ppa)
{
@@ -1150,7 +1128,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
line->smeta_ssec = off;
line->cur_sec = off + lm->smeta_sec;

- if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
+ if (init && pblk_line_smeta_write(pblk, line, off)) {
pblk_debug(pblk, "line smeta I/O failed. Retry\n");
return 0;
}
diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
index b0f0c637df3a..73e64c1b55ef 100644
--- a/drivers/lightnvm/pblk-gc.c
+++ b/drivers/lightnvm/pblk-gc.c
@@ -148,7 +148,7 @@ static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
if (!emeta_buf)
return NULL;

- ret = pblk_line_read_emeta(pblk, line, emeta_buf);
+ ret = pblk_line_emeta_read(pblk, line, emeta_buf);
if (ret) {
pblk_err(pblk, "line %d read emeta failed (%d)\n",
line->id, ret);
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index c641a9ed5f39..95ea5072b27e 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -839,7 +839,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
continue;

/* Lines that cannot be read are assumed as not written here */
- if (pblk_line_read_smeta(pblk, line))
+ if (pblk_line_smeta_read(pblk, line))
continue;

crc = pblk_calc_smeta_crc(pblk, smeta_buf);
@@ -909,7 +909,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
line->emeta = emeta;
memset(line->emeta->buf, 0, lm->emeta_len[0]);

- if (pblk_line_read_emeta(pblk, line, line->emeta->buf)) {
+ if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
pblk_recov_l2p_from_oob(pblk, line);
goto next;
}
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 6195e3f5d2e6..e29fd35d2991 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -817,8 +817,8 @@ void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
void (*work)(struct work_struct *), gfp_t gfp_mask,
struct workqueue_struct *wq);
u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line);
-int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line);
-int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
+int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line);
+int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
void *emeta_buf);
int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr erase_ppa);
void pblk_line_put(struct kref *ref);
--
2.7.4


2018-08-29 08:58:28

by Javier González

[permalink] [raw]
Subject: [PATCH 2/3] lightnvm: encapsule rqd dma allocations

dma allocations for ppa_list and meta_list in rqd are replicated in
several places across the pblk codebase. Make helpers to encapsulate
creation and deletion to simplify the code.

Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
drivers/lightnvm/pblk-write.c | 15 ++-------
drivers/lightnvm/pblk.h | 3 ++
5 files changed, 74 insertions(+), 77 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 09160ec02c5f..767178185f19 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
spin_unlock(&pblk->trans_lock);
}

+int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
+ bool is_vector)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+
+ rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
+ &rqd->dma_meta_list);
+ if (!rqd->meta_list)
+ return -ENOMEM;
+
+ if (!is_vector)
+ return 0;
+
+ rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
+ rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+
+ return 0;
+}
+
+void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+
+ if (rqd->meta_list)
+ nvm_dev_dma_free(dev->parent, rqd->meta_list,
+ rqd->dma_meta_list);
+}
+
/* Caller must guarantee that the request is a valid type */
struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type)
{
@@ -268,7 +296,6 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type)
/* Typically used on completion path. Cannot guarantee request consistency */
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
{
- struct nvm_tgt_dev *dev = pblk->dev;
mempool_t *pool;

switch (type) {
@@ -289,9 +316,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
return;
}

- if (rqd->meta_list)
- nvm_dev_dma_free(dev->parent, rqd->meta_list,
- rqd->dma_meta_list);
+ pblk_clear_rqd(pblk, rqd);
mempool_free(rqd, pool);
}

@@ -687,18 +712,14 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)

memset(&rqd, 0, sizeof(struct nvm_rq));

- rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd.dma_meta_list);
- if (!rqd.meta_list)
- return -ENOMEM;
-
- rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
- rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+ ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
+ if (ret)
+ return ret;

bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
- goto free_ppa_list;
+ goto clear_rqd;
}

bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -716,7 +737,7 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
if (ret) {
pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
bio_put(bio);
- goto free_ppa_list;
+ goto clear_rqd;
}

atomic_dec(&pblk->inflight_io);
@@ -724,8 +745,8 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
if (rqd.error)
pblk_log_read_err(pblk, &rqd);

-free_ppa_list:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+clear_rqd:
+ pblk_clear_rqd(pblk, &rqd);
return ret;
}

@@ -742,18 +763,14 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,

memset(&rqd, 0, sizeof(struct nvm_rq));

- rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd.dma_meta_list);
- if (!rqd.meta_list)
- return -ENOMEM;
-
- rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
- rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+ ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
+ if (ret)
+ return ret;

bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
- goto free_ppa_list;
+ goto clear_rqd;
}

bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -775,7 +792,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
if (ret) {
pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
bio_put(bio);
- goto free_ppa_list;
+ goto clear_rqd;
}

atomic_dec(&pblk->inflight_io);
@@ -785,8 +802,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
ret = -EIO;
}

-free_ppa_list:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+clear_rqd:
+ pblk_clear_rqd(pblk, &rqd);
return ret;
}

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 6d13763f2f6a..57d3155ef9a5 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -452,19 +452,15 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
*/
bio_init_idx = pblk_get_bi_idx(bio);

- rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd->dma_meta_list);
- if (!rqd->meta_list) {
- pblk_err(pblk, "not able to allocate ppa list\n");
- goto fail_rqd_free;
- }
-
if (nr_secs > 1) {
- rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
- rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+ if (pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true))
+ goto fail_rqd_free;

pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
} else {
+ if (pblk_setup_rqd(pblk, rqd, GFP_KERNEL, false))
+ goto fail_rqd_free;
+
pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
}

@@ -593,14 +589,10 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)

memset(&rqd, 0, sizeof(struct nvm_rq));

- rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd.dma_meta_list);
- if (!rqd.meta_list)
- return -ENOMEM;
-
if (gc_rq->nr_secs > 1) {
- rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
- rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
+ ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
+ if (ret)
+ return ret;

gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
gc_rq->lba_list,
@@ -609,6 +601,10 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
if (gc_rq->secs_to_gc == 1)
rqd.ppa_addr = rqd.ppa_list[0];
} else {
+ ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, false);
+ if (ret)
+ return ret;
+
gc_rq->secs_to_gc = read_rq_gc(pblk, &rqd, gc_rq->line,
gc_rq->lba_list[0],
gc_rq->paddr_list[0]);
@@ -622,7 +618,8 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
PBLK_VMALLOC_META, GFP_KERNEL);
if (IS_ERR(bio)) {
pblk_err(pblk, "could not allocate GC bio (%lu)\n",
- PTR_ERR(bio));
+ PTR_ERR(bio));
+ ret = PTR_ERR(bio);
goto err_free_dma;
}

@@ -657,12 +654,12 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
#endif

out:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+ pblk_clear_rqd(pblk, &rqd);
return ret;

err_free_bio:
bio_put(bio);
err_free_dma:
- nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
+ pblk_clear_rqd(pblk, &rqd);
return ret;
}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 95ea5072b27e..fdc770f2545f 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -241,13 +241,11 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
- struct ppa_addr *ppa_list;
struct pblk_sec_meta *meta_list;
struct pblk_pad_rq *pad_rq;
struct nvm_rq *rqd;
struct bio *bio;
void *data;
- dma_addr_t dma_ppa_list, dma_meta_list;
__le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
u64 w_ptr = line->cur_sec;
int left_line_ppas, rq_ppas, rq_len;
@@ -281,20 +279,11 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,

rq_len = rq_ppas * geo->csecs;

- meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
- if (!meta_list) {
- ret = -ENOMEM;
- goto fail_free_pad;
- }
-
- ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
- dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
-
bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
PBLK_VMALLOC_META, GFP_KERNEL);
if (IS_ERR(bio)) {
ret = PTR_ERR(bio);
- goto fail_free_meta;
+ goto fail_free_pad;
}

bio->bi_iter.bi_sector = 0; /* internal bio */
@@ -302,17 +291,19 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,

rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);

+ ret = pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true);
+ if (ret)
+ goto fail_free_bio;
+
rqd->bio = bio;
rqd->opcode = NVM_OP_PWRITE;
rqd->is_seq = 1;
- rqd->meta_list = meta_list;
rqd->nr_ppas = rq_ppas;
- rqd->ppa_list = ppa_list;
- rqd->dma_ppa_list = dma_ppa_list;
- rqd->dma_meta_list = dma_meta_list;
rqd->end_io = pblk_end_io_recov;
rqd->private = pad_rq;

+ meta_list = rqd->meta_list;
+
for (i = 0; i < rqd->nr_ppas; ) {
struct ppa_addr ppa;
int pos;
@@ -346,7 +337,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
if (ret) {
pblk_err(pblk, "I/O submission failed: %d\n", ret);
pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
- goto fail_free_bio;
+ goto fail_free_rqd;
}

left_line_ppas -= rq_ppas;
@@ -370,10 +361,10 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
kfree(pad_rq);
return ret;

+fail_free_rqd:
+ pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
fail_free_bio:
bio_put(bio);
-fail_free_meta:
- nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
fail_free_pad:
kfree(pad_rq);
vfree(data);
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 8ea66bb83c29..767618eba4c2 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -322,11 +322,8 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
}

static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
- unsigned int nr_secs,
- nvm_end_io_fn(*end_io))
+ unsigned int nr_secs, nvm_end_io_fn(*end_io))
{
- struct nvm_tgt_dev *dev = pblk->dev;
-
/* Setup write request */
rqd->opcode = NVM_OP_PWRITE;
rqd->nr_ppas = nr_secs;
@@ -334,15 +331,7 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
rqd->private = pblk;
rqd->end_io = end_io;

- rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
- &rqd->dma_meta_list);
- if (!rqd->meta_list)
- return -ENOMEM;
-
- rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
- rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
-
- return 0;
+ return pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true);
}

static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e29fd35d2991..c0d9eddd344b 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -778,6 +778,9 @@ ssize_t pblk_rb_sysfs(struct pblk_rb *rb, char *buf);
*/
struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type);
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type);
+int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
+ bool is_vecto);
+void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd);
void pblk_set_sec_per_write(struct pblk *pblk, int sec_per_write);
int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_c_ctx *c_ctx);
--
2.7.4


2018-08-29 08:58:43

by Javier González

[permalink] [raw]
Subject: [PATCH 3/3] lightnvm: pblk: take write semaphore on metadata

pblk guarantees write ordering at a chunk level through a per open chunk
semaphore. At this point, since we only have an open I/O stream for both
user and GC data, the semaphore is per parallel unit.

For the metadata I/O that is synchronous, the semaphore is not needed as
ordering is guaranteed. However, if the metadata scheme changes or
multiple streams are open, this guarantee might not be preserved.

This patch makes sure that all writes go through the semaphore, even for
synchronous I/O. This is consistent with pblk's write I/O model. It also
simplifies maintenance since changes in the metadata scheme could cause
ordering issues.

Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 16 +++++++++++++++-
drivers/lightnvm/pblk.h | 1 +
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 767178185f19..1e4dc0c1ed88 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -558,6 +558,20 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
return ret;
}

+int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
+{
+ struct ppa_addr *ppa_list;
+ int ret;
+
+ ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
+
+ pblk_down_page(pblk, ppa_list, rqd->nr_ppas);
+ ret = pblk_submit_io_sync(pblk, rqd);
+ pblk_up_page(pblk, ppa_list, rqd->nr_ppas);
+
+ return ret;
+}
+
static void pblk_bio_map_addr_endio(struct bio *bio)
{
bio_put(bio);
@@ -788,7 +802,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
meta_list[i].lba = lba_list[paddr] = addr_empty;
}

- ret = pblk_submit_io_sync(pblk, &rqd);
+ ret = pblk_submit_io_sync_sem(pblk, &rqd);
if (ret) {
pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
bio_put(bio);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index c0d9eddd344b..54f937c1fb62 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -793,6 +793,7 @@ void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd);
+int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line);
void pblk_check_chunk_state_update(struct pblk *pblk, struct nvm_rq *rqd);
struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
--
2.7.4


2018-08-29 13:02:31

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 2/3] lightnvm: encapsule rqd dma allocations

On 08/29/2018 10:56 AM, Javier González wrote:
> dma allocations for ppa_list and meta_list in rqd are replicated in
> several places across the pblk codebase. Make helpers to encapsulate
> creation and deletion to simplify the code.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
> drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
> drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
> drivers/lightnvm/pblk-write.c | 15 ++-------
> drivers/lightnvm/pblk.h | 3 ++
> 5 files changed, 74 insertions(+), 77 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 09160ec02c5f..767178185f19 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
> spin_unlock(&pblk->trans_lock);
> }
>
> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
> + bool is_vector)


The mem_flags argument can be removed. It is GFP_KERNEL from all the
places it is called.

is_vector, will be better to do nr_ppas (or similar name). Then
pblk_submit_read/pblk_submit_read_gc are a bit cleaner.

> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> +
> + rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
> + &rqd->dma_meta_list);
> + if (!rqd->meta_list)
> + return -ENOMEM;
> +
> + if (!is_vector)
> + return 0;
> +
> + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;

Wrt to is_vector, does it matter if we just set ppa_list and
dma_ppa_list? If we have them, we use them, else leave them alone?

> +
> + return 0;
> +}
> +
> +void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> +
> + if (rqd->meta_list)
> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
> + rqd->dma_meta_list);
> +}

Looks like setup/clear is mainly about managing the metadata. Would
pblk_alloc_rqd_meta()/pblk_free/rqd_meta be better names? Unless we can
fold it all into pblk_alloc_rqd/pblk_free_rqd.

> +
> /* Caller must guarantee that the request is a valid type */
> struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type)
> {
> @@ -268,7 +296,6 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type)
> /* Typically used on completion path. Cannot guarantee request consistency */
> void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
> {
> - struct nvm_tgt_dev *dev = pblk->dev;
> mempool_t *pool;
>
> switch (type) {
> @@ -289,9 +316,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
> return;
> }
>
> - if (rqd->meta_list)
> - nvm_dev_dma_free(dev->parent, rqd->meta_list,
> - rqd->dma_meta_list);
> + pblk_clear_rqd(pblk, rqd);
> mempool_free(rqd, pool);
> }
>
> @@ -687,18 +712,14 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>
> memset(&rqd, 0, sizeof(struct nvm_rq));
>
> - rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd.dma_meta_list);
> - if (!rqd.meta_list)
> - return -ENOMEM;
> -
> - rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> - rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> + ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
> + if (ret)
> + return ret;
>
> bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> if (IS_ERR(bio)) {
> ret = PTR_ERR(bio);
> - goto free_ppa_list;
> + goto clear_rqd;
> }
>
> bio->bi_iter.bi_sector = 0; /* internal bio */
> @@ -716,7 +737,7 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> if (ret) {
> pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> bio_put(bio);
> - goto free_ppa_list;
> + goto clear_rqd;
> }
>
> atomic_dec(&pblk->inflight_io);
> @@ -724,8 +745,8 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> if (rqd.error)
> pblk_log_read_err(pblk, &rqd);
>
> -free_ppa_list:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +clear_rqd:
> + pblk_clear_rqd(pblk, &rqd);
> return ret;
> }
>
> @@ -742,18 +763,14 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
>
> memset(&rqd, 0, sizeof(struct nvm_rq));
>
> - rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd.dma_meta_list);
> - if (!rqd.meta_list)
> - return -ENOMEM;
> -
> - rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> - rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> + ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
> + if (ret)
> + return ret;
>
> bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> if (IS_ERR(bio)) {
> ret = PTR_ERR(bio);
> - goto free_ppa_list;
> + goto clear_rqd;
> }
>
> bio->bi_iter.bi_sector = 0; /* internal bio */
> @@ -775,7 +792,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> if (ret) {
> pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> bio_put(bio);
> - goto free_ppa_list;
> + goto clear_rqd;
> }
>
> atomic_dec(&pblk->inflight_io);
> @@ -785,8 +802,8 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> ret = -EIO;
> }
>
> -free_ppa_list:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> +clear_rqd:
> + pblk_clear_rqd(pblk, &rqd);
> return ret;
> }
>
> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
> index 6d13763f2f6a..57d3155ef9a5 100644
> --- a/drivers/lightnvm/pblk-read.c
> +++ b/drivers/lightnvm/pblk-read.c
> @@ -452,19 +452,15 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
> */
> bio_init_idx = pblk_get_bi_idx(bio);
>
> - rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd->dma_meta_list);
> - if (!rqd->meta_list) {
> - pblk_err(pblk, "not able to allocate ppa list\n");
> - goto fail_rqd_free;
> - }
> -
> if (nr_secs > 1) {
> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
> + if (pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true))
> + goto fail_rqd_free;
>
> pblk_read_ppalist_rq(pblk, rqd, bio, blba, read_bitmap);
> } else {
> + if (pblk_setup_rqd(pblk, rqd, GFP_KERNEL, false))
> + goto fail_rqd_free;
> +
> pblk_read_rq(pblk, rqd, bio, blba, read_bitmap);
> }
>
> @@ -593,14 +589,10 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
>
> memset(&rqd, 0, sizeof(struct nvm_rq));
>
> - rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd.dma_meta_list);
> - if (!rqd.meta_list)
> - return -ENOMEM;
> -
> if (gc_rq->nr_secs > 1) {
> - rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> - rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> + ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, true);
> + if (ret)
> + return ret;
>
> gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
> gc_rq->lba_list,
> @@ -609,6 +601,10 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
> if (gc_rq->secs_to_gc == 1)
> rqd.ppa_addr = rqd.ppa_list[0];
> } else {
> + ret = pblk_setup_rqd(pblk, &rqd, GFP_KERNEL, false);
> + if (ret)
> + return ret;
> +
> gc_rq->secs_to_gc = read_rq_gc(pblk, &rqd, gc_rq->line,
> gc_rq->lba_list[0],
> gc_rq->paddr_list[0]);
> @@ -622,7 +618,8 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
> PBLK_VMALLOC_META, GFP_KERNEL);
> if (IS_ERR(bio)) {
> pblk_err(pblk, "could not allocate GC bio (%lu)\n",
> - PTR_ERR(bio));
> + PTR_ERR(bio));
> + ret = PTR_ERR(bio);
> goto err_free_dma;
> }
>
> @@ -657,12 +654,12 @@ int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
> #endif
>
> out:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> + pblk_clear_rqd(pblk, &rqd);
> return ret;
>
> err_free_bio:
> bio_put(bio);
> err_free_dma:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> + pblk_clear_rqd(pblk, &rqd);
> return ret;
> }
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index 95ea5072b27e..fdc770f2545f 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -241,13 +241,11 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> - struct ppa_addr *ppa_list;
> struct pblk_sec_meta *meta_list;
> struct pblk_pad_rq *pad_rq;
> struct nvm_rq *rqd;
> struct bio *bio;
> void *data;
> - dma_addr_t dma_ppa_list, dma_meta_list;
> __le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> u64 w_ptr = line->cur_sec;
> int left_line_ppas, rq_ppas, rq_len;
> @@ -281,20 +279,11 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>
> rq_len = rq_ppas * geo->csecs;
>
> - meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
> - if (!meta_list) {
> - ret = -ENOMEM;
> - goto fail_free_pad;
> - }
> -
> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
> -
> bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
> PBLK_VMALLOC_META, GFP_KERNEL);
> if (IS_ERR(bio)) {
> ret = PTR_ERR(bio);
> - goto fail_free_meta;
> + goto fail_free_pad;
> }
>
> bio->bi_iter.bi_sector = 0; /* internal bio */
> @@ -302,17 +291,19 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
>
> rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
>
> + ret = pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true);
> + if (ret)
> + goto fail_free_bio;
> +
> rqd->bio = bio;
> rqd->opcode = NVM_OP_PWRITE;
> rqd->is_seq = 1;
> - rqd->meta_list = meta_list;
> rqd->nr_ppas = rq_ppas;
> - rqd->ppa_list = ppa_list;
> - rqd->dma_ppa_list = dma_ppa_list;
> - rqd->dma_meta_list = dma_meta_list;
> rqd->end_io = pblk_end_io_recov;
> rqd->private = pad_rq;
>
> + meta_list = rqd->meta_list;
> +
> for (i = 0; i < rqd->nr_ppas; ) {
> struct ppa_addr ppa;
> int pos;
> @@ -346,7 +337,7 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> if (ret) {
> pblk_err(pblk, "I/O submission failed: %d\n", ret);
> pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
> - goto fail_free_bio;
> + goto fail_free_rqd;
> }
>
> left_line_ppas -= rq_ppas;
> @@ -370,10 +361,10 @@ static int pblk_recov_pad_oob(struct pblk *pblk, struct pblk_line *line,
> kfree(pad_rq);
> return ret;
>
> +fail_free_rqd:
> + pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
> fail_free_bio:
> bio_put(bio);
> -fail_free_meta:
> - nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
> fail_free_pad:
> kfree(pad_rq);
> vfree(data);
> diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
> index 8ea66bb83c29..767618eba4c2 100644
> --- a/drivers/lightnvm/pblk-write.c
> +++ b/drivers/lightnvm/pblk-write.c
> @@ -322,11 +322,8 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
> }
>
> static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
> - unsigned int nr_secs,
> - nvm_end_io_fn(*end_io))
> + unsigned int nr_secs, nvm_end_io_fn(*end_io))
> {
> - struct nvm_tgt_dev *dev = pblk->dev;
> -
> /* Setup write request */
> rqd->opcode = NVM_OP_PWRITE;
> rqd->nr_ppas = nr_secs;
> @@ -334,15 +331,7 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
> rqd->private = pblk;
> rqd->end_io = end_io;
>
> - rqd->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd->dma_meta_list);
> - if (!rqd->meta_list)
> - return -ENOMEM;
> -
> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
> -
> - return 0;
> + return pblk_setup_rqd(pblk, rqd, GFP_KERNEL, true);
> }
>
> static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index e29fd35d2991..c0d9eddd344b 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -778,6 +778,9 @@ ssize_t pblk_rb_sysfs(struct pblk_rb *rb, char *buf);
> */
> struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type);
> void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type);
> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
> + bool is_vecto);
> +void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd);
> void pblk_set_sec_per_write(struct pblk *pblk, int sec_per_write);
> int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
> struct pblk_c_ctx *c_ctx);
>


2018-08-29 13:04:45

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 1/3] lightnvm: pblk: refactor metadata paths

On 08/29/2018 10:56 AM, Javier González wrote:
> pblk maintains two different metadata paths for smeta and emeta, which
> store metadata at the start of the line and at the end of the line,
> respectively. Until now, these path has been common for writing and
> retrieving metadata, however, as these paths diverge, the common code
> becomes less clear and unnecessary complicated.
>
> In preparation for further changes to the metadata write path, this
> patch separates the write and read paths for smeta and emeta and
> removes the synchronous emeta path as it not used anymore (emeta is
> scheduled asynchronously to prevent jittering due to internal I/Os).
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 338 ++++++++++++++++++---------------------
> drivers/lightnvm/pblk-gc.c | 2 +-
> drivers/lightnvm/pblk-recovery.c | 4 +-
> drivers/lightnvm/pblk.h | 4 +-
> 4 files changed, 163 insertions(+), 185 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index dbf037b2b32f..09160ec02c5f 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -661,12 +661,137 @@ u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line)
> return paddr;
> }
>
> -/*
> - * Submit emeta to one LUN in the raid line at the time to avoid a deadlock when
> - * taking the per LUN semaphore.
> - */
> -static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> - void *emeta_buf, u64 paddr, int dir)
> +u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> + struct nvm_geo *geo = &dev->geo;
> + struct pblk_line_meta *lm = &pblk->lm;
> + int bit;
> +
> + /* This usually only happens on bad lines */
> + bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
> + if (bit >= lm->blk_per_line)
> + return -1;
> +
> + return bit * geo->ws_opt;
> +}
> +
> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> + struct pblk_line_meta *lm = &pblk->lm;
> + struct bio *bio;
> + struct nvm_rq rqd;
> + u64 paddr = pblk_line_smeta_start(pblk, line);
> + int i, ret;
> +
> + memset(&rqd, 0, sizeof(struct nvm_rq));
> +
> + rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> + &rqd.dma_meta_list);
> + if (!rqd.meta_list)
> + return -ENOMEM;
> +
> + rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> + rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;

If patch 2 is put first, then this is not needed.

> +
> + bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> + if (IS_ERR(bio)) {
> + ret = PTR_ERR(bio);
> + goto free_ppa_list;
> + }
> +
> + bio->bi_iter.bi_sector = 0; /* internal bio */
> + bio_set_op_attrs(bio, REQ_OP_READ, 0);
> +
> + rqd.bio = bio;
> + rqd.opcode = NVM_OP_PREAD;
> + rqd.nr_ppas = lm->smeta_sec;
> + rqd.is_seq = 1;
> +
> + for (i = 0; i < lm->smeta_sec; i++, paddr++)
> + rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> +
> + ret = pblk_submit_io_sync(pblk, &rqd);
> + if (ret) {
> + pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> + bio_put(bio);
> + goto free_ppa_list;
> + }
> +
> + atomic_dec(&pblk->inflight_io);
> +
> + if (rqd.error)
> + pblk_log_read_err(pblk, &rqd);
> +
> +free_ppa_list:
> + nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> + return ret;
> +}
> +
> +static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> + u64 paddr)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> + struct pblk_line_meta *lm = &pblk->lm;
> + struct bio *bio;
> + struct nvm_rq rqd;
> + __le64 *lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> + __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
> + int i, ret;
> +
> + memset(&rqd, 0, sizeof(struct nvm_rq));
> +
> + rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> + &rqd.dma_meta_list);
> + if (!rqd.meta_list)
> + return -ENOMEM;
> +
> + rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> + rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> +
> + bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> + if (IS_ERR(bio)) {
> + ret = PTR_ERR(bio);
> + goto free_ppa_list;
> + }
> +
> + bio->bi_iter.bi_sector = 0; /* internal bio */
> + bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> +
> + rqd.bio = bio;
> + rqd.opcode = NVM_OP_PWRITE;
> + rqd.nr_ppas = lm->smeta_sec;
> + rqd.is_seq = 1;
> +
> + for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> + struct pblk_sec_meta *meta_list = rqd.meta_list;
> +
> + rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> + meta_list[i].lba = lba_list[paddr] = addr_empty;
> + }
> +
> + ret = pblk_submit_io_sync(pblk, &rqd);
> + if (ret) {
> + pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> + bio_put(bio);
> + goto free_ppa_list;
> + }
> +
> + atomic_dec(&pblk->inflight_io);
> +
> + if (rqd.error) {
> + pblk_log_write_err(pblk, &rqd);
> + ret = -EIO;
> + }
> +
> +free_ppa_list:
> + nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> + return ret;
> +}
> +
> +int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> + void *emeta_buf)
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> @@ -675,24 +800,15 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> void *ppa_list, *meta_list;
> struct bio *bio;
> struct nvm_rq rqd;
> + u64 paddr = line->emeta_ssec;
> dma_addr_t dma_ppa_list, dma_meta_list;
> int min = pblk->min_write_pgs;
> int left_ppas = lm->emeta_sec[0];
> - int id = line->id;
> + int line_id = line->id;
> int rq_ppas, rq_len;
> - int cmd_op, bio_op;
> int i, j;
> int ret;
>
> - if (dir == PBLK_WRITE) {
> - bio_op = REQ_OP_WRITE;
> - cmd_op = NVM_OP_PWRITE;
> - } else if (dir == PBLK_READ) {
> - bio_op = REQ_OP_READ;
> - cmd_op = NVM_OP_PREAD;
> - } else
> - return -EINVAL;
> -
> meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> &dma_meta_list);
> if (!meta_list)
> @@ -715,64 +831,43 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
> }
>
> bio->bi_iter.bi_sector = 0; /* internal bio */
> - bio_set_op_attrs(bio, bio_op, 0);
> + bio_set_op_attrs(bio, REQ_OP_READ, 0);
>
> rqd.bio = bio;
> rqd.meta_list = meta_list;
> rqd.ppa_list = ppa_list;
> rqd.dma_meta_list = dma_meta_list;
> rqd.dma_ppa_list = dma_ppa_list;
> - rqd.opcode = cmd_op;
> + rqd.opcode = NVM_OP_PREAD;
> rqd.nr_ppas = rq_ppas;
>
> - if (dir == PBLK_WRITE) {
> - struct pblk_sec_meta *meta_list = rqd.meta_list;
> + for (i = 0; i < rqd.nr_ppas; ) {
> + struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, line_id);
> + int pos = pblk_ppa_to_pos(geo, ppa);
>
> - rqd.is_seq = 1;
> - for (i = 0; i < rqd.nr_ppas; ) {
> - spin_lock(&line->lock);
> - paddr = __pblk_alloc_page(pblk, line, min);
> - spin_unlock(&line->lock);
> - for (j = 0; j < min; j++, i++, paddr++) {
> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
> - rqd.ppa_list[i] =
> - addr_to_gen_ppa(pblk, paddr, id);
> - }
> - }
> - } else {
> - for (i = 0; i < rqd.nr_ppas; ) {
> - struct ppa_addr ppa = addr_to_gen_ppa(pblk, paddr, id);
> - int pos = pblk_ppa_to_pos(geo, ppa);
> + if (pblk_io_aligned(pblk, rq_ppas))
> + rqd.is_seq = 1;
>
> - if (pblk_io_aligned(pblk, rq_ppas))
> - rqd.is_seq = 1;
> -
> - while (test_bit(pos, line->blk_bitmap)) {
> - paddr += min;
> - if (pblk_boundary_paddr_checks(pblk, paddr)) {
> - pblk_err(pblk, "corrupt emeta line:%d\n",
> - line->id);
> - bio_put(bio);
> - ret = -EINTR;
> - goto free_rqd_dma;
> - }
> -
> - ppa = addr_to_gen_ppa(pblk, paddr, id);
> - pos = pblk_ppa_to_pos(geo, ppa);
> - }
> -
> - if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
> - pblk_err(pblk, "corrupt emeta line:%d\n",
> - line->id);
> + while (test_bit(pos, line->blk_bitmap)) {
> + paddr += min;
> + if (pblk_boundary_paddr_checks(pblk, paddr)) {
> bio_put(bio);
> ret = -EINTR;
> goto free_rqd_dma;
> }
>
> - for (j = 0; j < min; j++, i++, paddr++)
> - rqd.ppa_list[i] =
> - addr_to_gen_ppa(pblk, paddr, line->id);
> + ppa = addr_to_gen_ppa(pblk, paddr, line_id);
> + pos = pblk_ppa_to_pos(geo, ppa);
> }
> +
> + if (pblk_boundary_paddr_checks(pblk, paddr + min)) {
> + bio_put(bio);
> + ret = -EINTR;
> + goto free_rqd_dma;
> + }
> +
> + for (j = 0; j < min; j++, i++, paddr++)
> + rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line_id);
> }
>
> ret = pblk_submit_io_sync(pblk, &rqd);
> @@ -784,136 +879,19 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>
> atomic_dec(&pblk->inflight_io);
>
> - if (rqd.error) {
> - if (dir == PBLK_WRITE)
> - pblk_log_write_err(pblk, &rqd);
> - else
> - pblk_log_read_err(pblk, &rqd);
> - }
> + if (rqd.error)
> + pblk_log_read_err(pblk, &rqd);
>
> emeta_buf += rq_len;
> left_ppas -= rq_ppas;
> if (left_ppas)
> goto next_rq;
> +
> free_rqd_dma:
> nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> return ret;
> }
>
> -u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
> -{
> - struct nvm_tgt_dev *dev = pblk->dev;
> - struct nvm_geo *geo = &dev->geo;
> - struct pblk_line_meta *lm = &pblk->lm;
> - int bit;
> -
> - /* This usually only happens on bad lines */
> - bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
> - if (bit >= lm->blk_per_line)
> - return -1;
> -
> - return bit * geo->ws_opt;
> -}
> -
> -static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
> - u64 paddr, int dir)
> -{
> - struct nvm_tgt_dev *dev = pblk->dev;
> - struct pblk_line_meta *lm = &pblk->lm;
> - struct bio *bio;
> - struct nvm_rq rqd;
> - __le64 *lba_list = NULL;
> - int i, ret;
> - int cmd_op, bio_op;
> -
> - if (dir == PBLK_WRITE) {
> - bio_op = REQ_OP_WRITE;
> - cmd_op = NVM_OP_PWRITE;
> - lba_list = emeta_to_lbas(pblk, line->emeta->buf);
> - } else if (dir == PBLK_READ_RECOV || dir == PBLK_READ) {
> - bio_op = REQ_OP_READ;
> - cmd_op = NVM_OP_PREAD;
> - } else
> - return -EINVAL;
> -
> - memset(&rqd, 0, sizeof(struct nvm_rq));
> -
> - rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
> - &rqd.dma_meta_list);
> - if (!rqd.meta_list)
> - return -ENOMEM;
> -
> - rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
> - rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
> -
> - bio = bio_map_kern(dev->q, line->smeta, lm->smeta_len, GFP_KERNEL);
> - if (IS_ERR(bio)) {
> - ret = PTR_ERR(bio);
> - goto free_ppa_list;
> - }
> -
> - bio->bi_iter.bi_sector = 0; /* internal bio */
> - bio_set_op_attrs(bio, bio_op, 0);
> -
> - rqd.bio = bio;
> - rqd.opcode = cmd_op;
> - rqd.is_seq = 1;
> - rqd.nr_ppas = lm->smeta_sec;
> -
> - for (i = 0; i < lm->smeta_sec; i++, paddr++) {
> - struct pblk_sec_meta *meta_list = rqd.meta_list;
> -
> - rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
> -
> - if (dir == PBLK_WRITE) {
> - __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
> -
> - meta_list[i].lba = lba_list[paddr] = addr_empty;
> - }
> - }
> -
> - /*
> - * This I/O is sent by the write thread when a line is replace. Since
> - * the write thread is the only one sending write and erase commands,
> - * there is no need to take the LUN semaphore.
> - */
> - ret = pblk_submit_io_sync(pblk, &rqd);
> - if (ret) {
> - pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> - bio_put(bio);
> - goto free_ppa_list;
> - }
> -
> - atomic_dec(&pblk->inflight_io);
> -
> - if (rqd.error) {
> - if (dir == PBLK_WRITE) {
> - pblk_log_write_err(pblk, &rqd);
> - ret = 1;
> - } else if (dir == PBLK_READ)
> - pblk_log_read_err(pblk, &rqd);
> - }
> -
> -free_ppa_list:
> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
> -
> - return ret;
> -}
> -
> -int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
> -{
> - u64 bpaddr = pblk_line_smeta_start(pblk, line);
> -
> - return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV);
> -}
> -
> -int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
> - void *emeta_buf)
> -{
> - return pblk_line_submit_emeta_io(pblk, line, emeta_buf,
> - line->emeta_ssec, PBLK_READ);
> -}
> -
> static void pblk_setup_e_rq(struct pblk *pblk, struct nvm_rq *rqd,
> struct ppa_addr ppa)
> {
> @@ -1150,7 +1128,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
> line->smeta_ssec = off;
> line->cur_sec = off + lm->smeta_sec;
>
> - if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
> + if (init && pblk_line_smeta_write(pblk, line, off)) {
> pblk_debug(pblk, "line smeta I/O failed. Retry\n");
> return 0;
> }
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0f0c637df3a..73e64c1b55ef 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -148,7 +148,7 @@ static __le64 *get_lba_list_from_emeta(struct pblk *pblk,
> if (!emeta_buf)
> return NULL;
>
> - ret = pblk_line_read_emeta(pblk, line, emeta_buf);
> + ret = pblk_line_emeta_read(pblk, line, emeta_buf);
> if (ret) {
> pblk_err(pblk, "line %d read emeta failed (%d)\n",
> line->id, ret);
> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
> index c641a9ed5f39..95ea5072b27e 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -839,7 +839,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
> continue;
>
> /* Lines that cannot be read are assumed as not written here */
> - if (pblk_line_read_smeta(pblk, line))
> + if (pblk_line_smeta_read(pblk, line))
> continue;
>
> crc = pblk_calc_smeta_crc(pblk, smeta_buf);
> @@ -909,7 +909,7 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
> line->emeta = emeta;
> memset(line->emeta->buf, 0, lm->emeta_len[0]);
>
> - if (pblk_line_read_emeta(pblk, line, line->emeta->buf)) {
> + if (pblk_line_emeta_read(pblk, line, line->emeta->buf)) {
> pblk_recov_l2p_from_oob(pblk, line);
> goto next;
> }
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 6195e3f5d2e6..e29fd35d2991 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -817,8 +817,8 @@ void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
> void (*work)(struct work_struct *), gfp_t gfp_mask,
> struct workqueue_struct *wq);
> u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line);
> -int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line);
> -int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line);
> +int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
> void *emeta_buf);
> int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr erase_ppa);
> void pblk_line_put(struct kref *ref);
>


2018-08-29 13:09:38

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 3/3] lightnvm: pblk: take write semaphore on metadata

On 08/29/2018 10:56 AM, Javier González wrote:
> pblk guarantees write ordering at a chunk level through a per open chunk
> semaphore. At this point, since we only have an open I/O stream for both
> user and GC data, the semaphore is per parallel unit.
>
> For the metadata I/O that is synchronous, the semaphore is not needed as
> ordering is guaranteed. However, if the metadata scheme changes or
> multiple streams are open, this guarantee might not be preserved.
>
> This patch makes sure that all writes go through the semaphore, even for
> synchronous I/O. This is consistent with pblk's write I/O model. It also
> simplifies maintenance since changes in the metadata scheme could cause
> ordering issues.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-core.c | 16 +++++++++++++++-
> drivers/lightnvm/pblk.h | 1 +
> 2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 767178185f19..1e4dc0c1ed88 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -558,6 +558,20 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
> return ret;
> }
>
> +int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
> +{
> + struct ppa_addr *ppa_list;
> + int ret;
> +
> + ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
> +
> + pblk_down_page(pblk, ppa_list, rqd->nr_ppas);

If the debug stuff is killed inside __pblk_down_page, then ppa_list and
rqd->nr_ppas does not need to be passed, and this function can be
inlined in its caller. Can we kill it? I'll make the patch if you like.

> + ret = pblk_submit_io_sync(pblk, rqd);
> + pblk_up_page(pblk, ppa_list, rqd->nr_ppas);
> +
> + return ret;
> +}
> +
> static void pblk_bio_map_addr_endio(struct bio *bio)
> {
> bio_put(bio);
> @@ -788,7 +802,7 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
> meta_list[i].lba = lba_list[paddr] = addr_empty;
> }
>
> - ret = pblk_submit_io_sync(pblk, &rqd);
> + ret = pblk_submit_io_sync_sem(pblk, &rqd);
> if (ret) {
> pblk_err(pblk, "smeta I/O submission failed: %d\n", ret);
> bio_put(bio);
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index c0d9eddd344b..54f937c1fb62 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -793,6 +793,7 @@ void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
> void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
> int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
> int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd);
> +int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd);
> int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line);
> void pblk_check_chunk_state_update(struct pblk *pblk, struct nvm_rq *rqd);
> struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
>


2018-08-29 13:20:52

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 2/3] lightnvm: encapsule rqd dma allocations

> On 29 Aug 2018, at 15.00, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 10:56 AM, Javier González wrote:
>> dma allocations for ppa_list and meta_list in rqd are replicated in
>> several places across the pblk codebase. Make helpers to encapsulate
>> creation and deletion to simplify the code.
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
>> drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
>> drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
>> drivers/lightnvm/pblk-write.c | 15 ++-------
>> drivers/lightnvm/pblk.h | 3 ++
>> 5 files changed, 74 insertions(+), 77 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 09160ec02c5f..767178185f19 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
>> spin_unlock(&pblk->trans_lock);
>> }
>> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>> + bool is_vector)
>
>
> The mem_flags argument can be removed. It is GFP_KERNEL from all the
> places it is called.
>

Thought it was better to have the flexibility in a helper function, but
we can always add it later on if needed...

> is_vector, will be better to do nr_ppas (or similar name). Then
> pblk_submit_read/pblk_submit_read_gc are a bit cleaner.
>

We can do that too, yes.


>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> +
>> + rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>> + &rqd->dma_meta_list);
>> + if (!rqd->meta_list)
>> + return -ENOMEM;
>> +
>> + if (!is_vector)
>> + return 0;
>> +
>> + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>
> Wrt to is_vector, does it matter if we just set ppa_list and
> dma_ppa_list? If we have them, we use them, else leave them alone?
>

If we only have 1 address then ppa_addr is set and the ppa_list attempt
to free in the completion path interpreting ppa_addr as the dma address.
So I don't think so - unless I'm missing something?

>> +
>> + return 0;
>> +}
>> +
>> +void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> +
>> + if (rqd->meta_list)
>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>> + rqd->dma_meta_list);
>> +}
>
> Looks like setup/clear is mainly about managing the metadata. Would
> pblk_alloc_rqd_meta()/pblk_free/rqd_meta be better names? Unless we
> can fold it all into pblk_alloc_rqd/pblk_free_rqd.
>

It's not easy to fold them there as we use nvm_rq allocations without
extra space in the rqd for metadata. This is also a problem for rqd
allocated in the stack. But I can change the names to make the
functionality more clear.

Javier


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2018-08-29 13:22:39

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 1/3] lightnvm: pblk: refactor metadata paths


> On 29 Aug 2018, at 15.02, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 10:56 AM, Javier González wrote:
>> pblk maintains two different metadata paths for smeta and emeta, which
>> store metadata at the start of the line and at the end of the line,
>> respectively. Until now, these path has been common for writing and
>> retrieving metadata, however, as these paths diverge, the common code
>> becomes less clear and unnecessary complicated.
>> In preparation for further changes to the metadata write path, this
>> patch separates the write and read paths for smeta and emeta and
>> removes the synchronous emeta path as it not used anymore (emeta is
>> scheduled asynchronously to prevent jittering due to internal I/Os).
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-core.c | 338 ++++++++++++++++++---------------------
>> drivers/lightnvm/pblk-gc.c | 2 +-
>> drivers/lightnvm/pblk-recovery.c | 4 +-
>> drivers/lightnvm/pblk.h | 4 +-
>> 4 files changed, 163 insertions(+), 185 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index dbf037b2b32f..09160ec02c5f 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -661,12 +661,137 @@ u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line)
>> return paddr;
>> }
>> -/*
>> - * Submit emeta to one LUN in the raid line at the time to avoid a deadlock when
>> - * taking the per LUN semaphore.
>> - */
>> -static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
>> - void *emeta_buf, u64 paddr, int dir)
>> +u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line)
>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> + struct nvm_geo *geo = &dev->geo;
>> + struct pblk_line_meta *lm = &pblk->lm;
>> + int bit;
>> +
>> + /* This usually only happens on bad lines */
>> + bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
>> + if (bit >= lm->blk_per_line)
>> + return -1;
>> +
>> + return bit * geo->ws_opt;
>> +}
>> +
>> +int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line)
>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> + struct pblk_line_meta *lm = &pblk->lm;
>> + struct bio *bio;
>> + struct nvm_rq rqd;
>> + u64 paddr = pblk_line_smeta_start(pblk, line);
>> + int i, ret;
>> +
>> + memset(&rqd, 0, sizeof(struct nvm_rq));
>> +
>> + rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
>> + &rqd.dma_meta_list);
>> + if (!rqd.meta_list)
>> + return -ENOMEM;
>> +
>> + rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
>> + rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
>
> If patch 2 is put first, then this is not needed.
>

True... I will reorder with the changes on 2/3.


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2018-08-29 13:23:33

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 3/3] lightnvm: pblk: take write semaphore on metadata

> On 29 Aug 2018, at 15.08, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 10:56 AM, Javier González wrote:
>> pblk guarantees write ordering at a chunk level through a per open chunk
>> semaphore. At this point, since we only have an open I/O stream for both
>> user and GC data, the semaphore is per parallel unit.
>> For the metadata I/O that is synchronous, the semaphore is not needed as
>> ordering is guaranteed. However, if the metadata scheme changes or
>> multiple streams are open, this guarantee might not be preserved.
>> This patch makes sure that all writes go through the semaphore, even for
>> synchronous I/O. This is consistent with pblk's write I/O model. It also
>> simplifies maintenance since changes in the metadata scheme could cause
>> ordering issues.
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-core.c | 16 +++++++++++++++-
>> drivers/lightnvm/pblk.h | 1 +
>> 2 files changed, 16 insertions(+), 1 deletion(-)
>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>> index 767178185f19..1e4dc0c1ed88 100644
>> --- a/drivers/lightnvm/pblk-core.c
>> +++ b/drivers/lightnvm/pblk-core.c
>> @@ -558,6 +558,20 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
>> return ret;
>> }
>> +int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
>> +{
>> + struct ppa_addr *ppa_list;
>> + int ret;
>> +
>> + ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
>> +
>> + pblk_down_page(pblk, ppa_list, rqd->nr_ppas);
>
> If the debug stuff is killed inside __pblk_down_page, then ppa_list
> and rqd->nr_ppas does not need to be passed, and this function can be
> inlined in its caller. Can we kill it? I'll make the patch if you
> like.

Sounds good. Sure, please send - should I wait to resend this series?


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2018-08-29 13:38:04

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 2/3] lightnvm: encapsule rqd dma allocations

On 08/29/2018 03:18 PM, Javier Gonzalez wrote:
>> On 29 Aug 2018, at 15.00, Matias Bjørling <[email protected]> wrote:
>>
>> On 08/29/2018 10:56 AM, Javier González wrote:
>>> dma allocations for ppa_list and meta_list in rqd are replicated in
>>> several places across the pblk codebase. Make helpers to encapsulate
>>> creation and deletion to simplify the code.
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
>>> drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
>>> drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
>>> drivers/lightnvm/pblk-write.c | 15 ++-------
>>> drivers/lightnvm/pblk.h | 3 ++
>>> 5 files changed, 74 insertions(+), 77 deletions(-)
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 09160ec02c5f..767178185f19 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
>>> spin_unlock(&pblk->trans_lock);
>>> }
>>> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>> + bool is_vector)
>>
>>
>> The mem_flags argument can be removed. It is GFP_KERNEL from all the
>> places it is called.
>>
>
> Thought it was better to have the flexibility in a helper function, but
> we can always add it later on if needed...
>
>> is_vector, will be better to do nr_ppas (or similar name). Then
>> pblk_submit_read/pblk_submit_read_gc are a bit cleaner.
>>
>
> We can do that too, yes.
>
>
>>> +{
>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>> +
>>> + rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>> + &rqd->dma_meta_list);
>>> + if (!rqd->meta_list)
>>> + return -ENOMEM;
>>> +
>>> + if (!is_vector)
>>> + return 0;
>>> +
>>> + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>
>> Wrt to is_vector, does it matter if we just set ppa_list and
>> dma_ppa_list? If we have them, we use them, else leave them alone?
>>
>
> If we only have 1 address then ppa_addr is set and the ppa_list attempt
> to free in the completion path interpreting ppa_addr as the dma address.
> So I don't think so - unless I'm missing something?

In that case, we could drop is_vector/nr_ppas all together? That would
be nice.

>
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>> +{
>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>> +
>>> + if (rqd->meta_list)
>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>> + rqd->dma_meta_list);
>>> +}
>>
>> Looks like setup/clear is mainly about managing the metadata. Would
>> pblk_alloc_rqd_meta()/pblk_free/rqd_meta be better names? Unless we
>> can fold it all into pblk_alloc_rqd/pblk_free_rqd.
>>
>
> It's not easy to fold them there as we use nvm_rq allocations without
> extra space in the rqd for metadata. This is also a problem for rqd
> allocated in the stack. But I can change the names to make the
> functionality more clear.

Yep, that was what I felt as well. Renaming will be good.

>
> Javier
>


2018-08-29 13:41:53

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 3/3] lightnvm: pblk: take write semaphore on metadata

On 08/29/2018 03:21 PM, Javier Gonzalez wrote:
>> On 29 Aug 2018, at 15.08, Matias Bjørling <[email protected]> wrote:
>>
>> On 08/29/2018 10:56 AM, Javier González wrote:
>>> pblk guarantees write ordering at a chunk level through a per open chunk
>>> semaphore. At this point, since we only have an open I/O stream for both
>>> user and GC data, the semaphore is per parallel unit.
>>> For the metadata I/O that is synchronous, the semaphore is not needed as
>>> ordering is guaranteed. However, if the metadata scheme changes or
>>> multiple streams are open, this guarantee might not be preserved.
>>> This patch makes sure that all writes go through the semaphore, even for
>>> synchronous I/O. This is consistent with pblk's write I/O model. It also
>>> simplifies maintenance since changes in the metadata scheme could cause
>>> ordering issues.
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/lightnvm/pblk-core.c | 16 +++++++++++++++-
>>> drivers/lightnvm/pblk.h | 1 +
>>> 2 files changed, 16 insertions(+), 1 deletion(-)
>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>> index 767178185f19..1e4dc0c1ed88 100644
>>> --- a/drivers/lightnvm/pblk-core.c
>>> +++ b/drivers/lightnvm/pblk-core.c
>>> @@ -558,6 +558,20 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
>>> return ret;
>>> }
>>> +int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
>>> +{
>>> + struct ppa_addr *ppa_list;
>>> + int ret;
>>> +
>>> + ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
>>> +
>>> + pblk_down_page(pblk, ppa_list, rqd->nr_ppas);
>>
>> If the debug stuff is killed inside __pblk_down_page, then ppa_list
>> and rqd->nr_ppas does not need to be passed, and this function can be
>> inlined in its caller. Can we kill it? I'll make the patch if you
>> like.
>
> Sounds good. Sure, please send - should I wait to resend this series?
>

Will do. Yes, wait a bit. I'll post asap.

2018-08-29 13:43:59

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 3/3] lightnvm: pblk: take write semaphore on metadata


> On 29 Aug 2018, at 15.40, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 03:21 PM, Javier Gonzalez wrote:
>>> On 29 Aug 2018, at 15.08, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 08/29/2018 10:56 AM, Javier González wrote:
>>>> pblk guarantees write ordering at a chunk level through a per open chunk
>>>> semaphore. At this point, since we only have an open I/O stream for both
>>>> user and GC data, the semaphore is per parallel unit.
>>>> For the metadata I/O that is synchronous, the semaphore is not needed as
>>>> ordering is guaranteed. However, if the metadata scheme changes or
>>>> multiple streams are open, this guarantee might not be preserved.
>>>> This patch makes sure that all writes go through the semaphore, even for
>>>> synchronous I/O. This is consistent with pblk's write I/O model. It also
>>>> simplifies maintenance since changes in the metadata scheme could cause
>>>> ordering issues.
>>>> Signed-off-by: Javier González <[email protected]>
>>>> ---
>>>> drivers/lightnvm/pblk-core.c | 16 +++++++++++++++-
>>>> drivers/lightnvm/pblk.h | 1 +
>>>> 2 files changed, 16 insertions(+), 1 deletion(-)
>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>> index 767178185f19..1e4dc0c1ed88 100644
>>>> --- a/drivers/lightnvm/pblk-core.c
>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>> @@ -558,6 +558,20 @@ int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
>>>> return ret;
>>>> }
>>>> +int pblk_submit_io_sync_sem(struct pblk *pblk, struct nvm_rq *rqd)
>>>> +{
>>>> + struct ppa_addr *ppa_list;
>>>> + int ret;
>>>> +
>>>> + ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
>>>> +
>>>> + pblk_down_page(pblk, ppa_list, rqd->nr_ppas);
>>>
>>> If the debug stuff is killed inside __pblk_down_page, then ppa_list
>>> and rqd->nr_ppas does not need to be passed, and this function can be
>>> inlined in its caller. Can we kill it? I'll make the patch if you
>>> like.
>> Sounds good. Sure, please send - should I wait to resend this series?
>
> Will do. Yes, wait a bit. I'll post asap.

No hurry. Thanks!


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2018-08-29 13:44:41

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 2/3] lightnvm: encapsule rqd dma allocations

On 08/29/2018 03:41 PM, Javier Gonzalez wrote:
>
>> On 29 Aug 2018, at 15.36, Matias Bjørling <[email protected]> wrote:
>>
>> On 08/29/2018 03:18 PM, Javier Gonzalez wrote:
>>>> On 29 Aug 2018, at 15.00, Matias Bjørling <[email protected]> wrote:
>>>>
>>>> On 08/29/2018 10:56 AM, Javier González wrote:
>>>>> dma allocations for ppa_list and meta_list in rqd are replicated in
>>>>> several places across the pblk codebase. Make helpers to encapsulate
>>>>> creation and deletion to simplify the code.
>>>>> Signed-off-by: Javier González <[email protected]>
>>>>> ---
>>>>> drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
>>>>> drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
>>>>> drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
>>>>> drivers/lightnvm/pblk-write.c | 15 ++-------
>>>>> drivers/lightnvm/pblk.h | 3 ++
>>>>> 5 files changed, 74 insertions(+), 77 deletions(-)
>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>>> index 09160ec02c5f..767178185f19 100644
>>>>> --- a/drivers/lightnvm/pblk-core.c
>>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>>> @@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
>>>>> spin_unlock(&pblk->trans_lock);
>>>>> }
>>>>> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>>> + bool is_vector)
>>>>
>>>>
>>>> The mem_flags argument can be removed. It is GFP_KERNEL from all the
>>>> places it is called.
>>> Thought it was better to have the flexibility in a helper function, but
>>> we can always add it later on if needed...
>>>> is_vector, will be better to do nr_ppas (or similar name). Then
>>>> pblk_submit_read/pblk_submit_read_gc are a bit cleaner.
>>> We can do that too, yes.
>>>>> +{
>>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>>> +
>>>>> + rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>>> + &rqd->dma_meta_list);
>>>>> + if (!rqd->meta_list)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + if (!is_vector)
>>>>> + return 0;
>>>>> +
>>>>> + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>>
>>>> Wrt to is_vector, does it matter if we just set ppa_list and
>>>> dma_ppa_list? If we have them, we use them, else leave them alone?
>>> If we only have 1 address then ppa_addr is set and the ppa_list attempt
>>> to free in the completion path interpreting ppa_addr as the dma address.
>>> So I don't think so - unless I'm missing something?
>>
>> In that case, we could drop is_vector/nr_ppas all together? That would be nice.
>>
>
> The problem is that the metadata region still needs to be used, even if
> the ppa_list is not set. Thing that the oob area can be larger than
> 64bits, so we cannot do the dma address is the actual value trick.
>
> So if encapsulating, we need to know if we need to allocate the ppa_list
> or not.
>
> Does it make sense?

Cool. We'll just leave it as is.

2018-08-29 13:45:38

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 2/3] lightnvm: encapsule rqd dma allocations


> On 29 Aug 2018, at 15.36, Matias Bjørling <[email protected]> wrote:
>
> On 08/29/2018 03:18 PM, Javier Gonzalez wrote:
>>> On 29 Aug 2018, at 15.00, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 08/29/2018 10:56 AM, Javier González wrote:
>>>> dma allocations for ppa_list and meta_list in rqd are replicated in
>>>> several places across the pblk codebase. Make helpers to encapsulate
>>>> creation and deletion to simplify the code.
>>>> Signed-off-by: Javier González <[email protected]>
>>>> ---
>>>> drivers/lightnvm/pblk-core.c | 69 +++++++++++++++++++++++++---------------
>>>> drivers/lightnvm/pblk-read.c | 35 ++++++++++----------
>>>> drivers/lightnvm/pblk-recovery.c | 29 ++++++-----------
>>>> drivers/lightnvm/pblk-write.c | 15 ++-------
>>>> drivers/lightnvm/pblk.h | 3 ++
>>>> 5 files changed, 74 insertions(+), 77 deletions(-)
>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
>>>> index 09160ec02c5f..767178185f19 100644
>>>> --- a/drivers/lightnvm/pblk-core.c
>>>> +++ b/drivers/lightnvm/pblk-core.c
>>>> @@ -237,6 +237,34 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
>>>> spin_unlock(&pblk->trans_lock);
>>>> }
>>>> +int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags,
>>>> + bool is_vector)
>>>
>>>
>>> The mem_flags argument can be removed. It is GFP_KERNEL from all the
>>> places it is called.
>> Thought it was better to have the flexibility in a helper function, but
>> we can always add it later on if needed...
>>> is_vector, will be better to do nr_ppas (or similar name). Then
>>> pblk_submit_read/pblk_submit_read_gc are a bit cleaner.
>> We can do that too, yes.
>>>> +{
>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>> +
>>>> + rqd->meta_list = nvm_dev_dma_alloc(dev->parent, mem_flags,
>>>> + &rqd->dma_meta_list);
>>>> + if (!rqd->meta_list)
>>>> + return -ENOMEM;
>>>> +
>>>> + if (!is_vector)
>>>> + return 0;
>>>> +
>>>> + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
>>>
>>> Wrt to is_vector, does it matter if we just set ppa_list and
>>> dma_ppa_list? If we have them, we use them, else leave them alone?
>> If we only have 1 address then ppa_addr is set and the ppa_list attempt
>> to free in the completion path interpreting ppa_addr as the dma address.
>> So I don't think so - unless I'm missing something?
>
> In that case, we could drop is_vector/nr_ppas all together? That would be nice.
>

The problem is that the metadata region still needs to be used, even if
the ppa_list is not set. Thing that the oob area can be larger than
64bits, so we cannot do the dma address is the actual value trick.

So if encapsulating, we need to know if we need to allocate the ppa_list
or not.

Does it make sense?

>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd)
>>>> +{
>>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>>> +
>>>> + if (rqd->meta_list)
>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list,
>>>> + rqd->dma_meta_list);
>>>> +}
>>>
>>> Looks like setup/clear is mainly about managing the metadata. Would
>>> pblk_alloc_rqd_meta()/pblk_free/rqd_meta be better names? Unless we
>>> can fold it all into pblk_alloc_rqd/pblk_free_rqd.
>> It's not easy to fold them there as we use nvm_rq allocations without
>> extra space in the rqd for metadata. This is also a problem for rqd
>> allocated in the stack. But I can change the names to make the
>> functionality more clear.
>
> Yep, that was what I felt as well. Renaming will be good.

Cool. Will do.


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP