2018-01-31 02:07:37

by Javier González

[permalink] [raw]
Subject: [PATCH 1/5] lightnvm: pblk: handle bad sectors in the emeta area correctly

From: Hans Holmberg <[email protected]>

Unless we check if there are bad sectors in the entire emeta-area
we risk ending up with valid bitmap / available sector count inconsistency.
This results in lines with a bad chunk at the last LUN marked as bad,
so go through the whole emeta area and mark up the invalid sectors.

Signed-off-by: Hans Holmberg <[email protected]>
---
drivers/lightnvm/pblk-core.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 0487b9340c1d..9027cf2ed1d8 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1021,6 +1021,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
int nr_bb = 0;
u64 off;
int bit = -1;
+ int emeta_secs;

line->sec_in_line = lm->sec_per_line;

@@ -1055,18 +1056,18 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
/* Mark emeta metadata sectors as bad sectors. We need to consider bad
* blocks to make sure that there are enough sectors to store emeta
*/
- off = lm->sec_per_line - lm->emeta_sec[0];
- bitmap_set(line->invalid_bitmap, off, lm->emeta_sec[0]);
- while (nr_bb) {
+ emeta_secs = lm->emeta_sec[0];
+ off = lm->sec_per_line;
+ while (emeta_secs) {
off -= geo->sec_per_pl;
if (!test_bit(off, line->invalid_bitmap)) {
bitmap_set(line->invalid_bitmap, off, geo->sec_per_pl);
- nr_bb--;
+ emeta_secs -= geo->sec_per_pl;
}
}

- line->sec_in_line -= lm->emeta_sec[0];
line->emeta_ssec = off;
+ line->sec_in_line -= lm->emeta_sec[0];
line->nr_valid_lbas = 0;
line->left_msecs = line->sec_in_line;
*line->vsc = cpu_to_le32(line->sec_in_line);
--
2.7.4



2018-01-31 02:07:59

by Javier González

[permalink] [raw]
Subject: [PATCH 3/5] lightnvm: pblk: export write amplification counters to sysfs

From: Hans Holmberg <[email protected]>

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

(1) number of user writes
(2) number of sectors written by the garbage collector
(3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

write_amp_mileage: the accumulated write amplification stats
for the lifetime of the pblk instance

write_amp_trip: resetable stats to facilitate delta measurements,
values reset at creation and if 0 is written
to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.

Signed-off-by: Hans Holmberg <[email protected]>
Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-cache.c | 4 ++
drivers/lightnvm/pblk-core.c | 6 +++
drivers/lightnvm/pblk-init.c | 11 ++++-
drivers/lightnvm/pblk-map.c | 2 +
drivers/lightnvm/pblk-rb.c | 3 ++
drivers/lightnvm/pblk-recovery.c | 25 ++++++++++++
drivers/lightnvm/pblk-sysfs.c | 86 +++++++++++++++++++++++++++++++++++++++-
drivers/lightnvm/pblk.h | 42 ++++++++++++++++----
8 files changed, 169 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index 000fcad38136..29a23111b31c 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -63,6 +63,8 @@ int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, unsigned long flags)
bio_advance(bio, PBLK_EXPOSED_PAGE_SIZE);
}

+ atomic64_add(nr_entries, &pblk->user_wa);
+
#ifdef CONFIG_NVM_DEBUG
atomic_long_add(nr_entries, &pblk->inflight_writes);
atomic_long_add(nr_entries, &pblk->req_writes);
@@ -117,6 +119,8 @@ int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
WARN_ONCE(gc_rq->secs_to_gc != valid_entries,
"pblk: inconsistent GC write\n");

+ atomic64_add(valid_entries, &pblk->gc_wa);
+
#ifdef CONFIG_NVM_DEBUG
atomic_long_add(valid_entries, &pblk->inflight_writes);
atomic_long_add(valid_entries, &pblk->recov_gc_writes);
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 155e42a26293..22e61cd4f801 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -1630,11 +1630,16 @@ void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line)
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_emeta *emeta = line->emeta;
struct line_emeta *emeta_buf = emeta->buf;
+ struct wa_counters *wa = emeta_to_wa(lm, emeta_buf);

/* No need for exact vsc value; avoid a big line lock and take aprox. */
memcpy(emeta_to_vsc(pblk, emeta_buf), l_mg->vsc_list, lm->vsc_list_len);
memcpy(emeta_to_bb(emeta_buf), line->blk_bitmap, lm->blk_bitmap_len);

+ wa->user = cpu_to_le64(atomic64_read(&pblk->user_wa));
+ wa->pad = cpu_to_le64(atomic64_read(&pblk->pad_wa));
+ wa->gc = cpu_to_le64(atomic64_read(&pblk->gc_wa));
+
emeta_buf->nr_valid_lbas = cpu_to_le64(line->nr_valid_lbas);
emeta_buf->crc = cpu_to_le32(pblk_calc_emeta_crc(pblk, emeta_buf));

@@ -1837,6 +1842,7 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
#endif
/* Invalidate and discard padded entries */
if (lba == ADDR_EMPTY) {
+ atomic64_inc(&pblk->pad_wa);
#ifdef CONFIG_NVM_DEBUG
atomic_long_inc(&pblk->padded_wb);
#endif
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 93d671ca518e..7eedc5daebc8 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -559,8 +559,8 @@ static unsigned int calc_emeta_len(struct pblk *pblk)

/* Round to sector size so that lba_list starts on its own sector */
lm->emeta_sec[1] = DIV_ROUND_UP(
- sizeof(struct line_emeta) + lm->blk_bitmap_len,
- geo->sec_size);
+ sizeof(struct line_emeta) + lm->blk_bitmap_len +
+ sizeof(struct wa_counters), geo->sec_size);
lm->emeta_len[1] = lm->emeta_sec[1] * geo->sec_size;

/* Round to sector size so that vsc_list starts on its own sector */
@@ -991,6 +991,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
if (flags & NVM_TARGET_FACTORY)
pblk_setup_uuid(pblk);

+ atomic64_set(&pblk->user_wa, 0);
+ atomic64_set(&pblk->pad_wa, 0);
+ atomic64_set(&pblk->gc_wa, 0);
+ pblk->user_rst_wa = 0;
+ pblk->pad_rst_wa = 0;
+ pblk->gc_rst_wa = 0;
+
#ifdef CONFIG_NVM_DEBUG
atomic_long_set(&pblk->inflight_writes, 0);
atomic_long_set(&pblk->padded_writes, 0);
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 7445e6430c52..04e08d76ea5f 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -65,6 +65,8 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
+ else
+ atomic64_inc(&pblk->pad_wa);
} else {
lba_list[paddr] = meta_list[i].lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index ec8fc314646b..7044b5599cc4 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -622,6 +622,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
}
}

+ atomic64_add(pad, &((struct pblk *)
+ (container_of(rb, struct pblk, rwb)))->pad_wa);
+
#ifdef CONFIG_NVM_DEBUG
atomic_long_add(pad, &((struct pblk *)
(container_of(rb, struct pblk, rwb)))->padded_writes);
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index a30fe203d454..e75a1af2eebe 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -845,6 +845,29 @@ static int pblk_recov_check_line_version(struct pblk *pblk,
return 0;
}

+static void pblk_recov_wa_counters(struct pblk *pblk,
+ struct line_emeta *emeta)
+{
+ struct pblk_line_meta *lm = &pblk->lm;
+ struct line_header *header = &emeta->header;
+ struct wa_counters *wa = emeta_to_wa(lm, emeta);
+
+ /* WA counters were introduced in emeta version 0.2 */
+ if (header->version_major > 0 || header->version_minor >= 2) {
+ u64 user = le64_to_cpu(wa->user);
+ u64 pad = le64_to_cpu(wa->pad);
+ u64 gc = le64_to_cpu(wa->gc);
+
+ atomic64_set(&pblk->user_wa, user);
+ atomic64_set(&pblk->pad_wa, pad);
+ atomic64_set(&pblk->gc_wa, gc);
+
+ pblk->user_rst_wa = user;
+ pblk->pad_rst_wa = pad;
+ pblk->gc_rst_wa = gc;
+ }
+}
+
struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
{
struct pblk_line_meta *lm = &pblk->lm;
@@ -965,6 +988,8 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
if (pblk_recov_check_line_version(pblk, line->emeta->buf))
return ERR_PTR(-EINVAL);

+ pblk_recov_wa_counters(pblk, line->emeta->buf);
+
if (pblk_recov_l2p_from_emeta(pblk, line))
pblk_recov_l2p_from_oob(pblk, line);

diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
index 620bab853579..4804bbd32d5f 100644
--- a/drivers/lightnvm/pblk-sysfs.c
+++ b/drivers/lightnvm/pblk-sysfs.c
@@ -298,6 +298,49 @@ static ssize_t pblk_sysfs_get_sec_per_write(struct pblk *pblk, char *page)
return snprintf(page, PAGE_SIZE, "%d\n", pblk->sec_per_write);
}

+static ssize_t pblk_get_write_amp(u64 user, u64 gc, u64 pad,
+ char *page)
+{
+ int sz;
+
+ sz = snprintf(page, PAGE_SIZE,
+ "user:%lld gc:%lld pad:%lld WA:",
+ user, gc, pad);
+
+ if (!user) {
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "NaN\n");
+ } else {
+ u64 wa_int, wa_frac, wa_frac_int;
+
+ wa_int = user + gc + pad;
+ sector_div(wa_int, user);
+
+ wa_frac_int = (user + gc + pad) * 100000;
+ sector_div(wa_frac_int, user);
+ div_u64_rem(wa_frac_int, 100000, (u32 *)&wa_frac);
+
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "%lld.%05lld\n",
+ wa_int, wa_frac);
+ }
+
+ return sz;
+}
+
+static ssize_t pblk_sysfs_get_write_amp_mileage(struct pblk *pblk, char *page)
+{
+ return pblk_get_write_amp(atomic64_read(&pblk->user_wa),
+ atomic64_read(&pblk->gc_wa), atomic64_read(&pblk->pad_wa),
+ page);
+}
+
+static ssize_t pblk_sysfs_get_write_amp_trip(struct pblk *pblk, char *page)
+{
+ return pblk_get_write_amp(
+ atomic64_read(&pblk->user_wa) - pblk->user_rst_wa,
+ atomic64_read(&pblk->gc_wa) - pblk->gc_rst_wa,
+ atomic64_read(&pblk->pad_wa) - pblk->pad_rst_wa, page);
+}
+
#ifdef CONFIG_NVM_DEBUG
static ssize_t pblk_sysfs_stats_debug(struct pblk *pblk, char *page)
{
@@ -360,6 +403,30 @@ static ssize_t pblk_sysfs_set_sec_per_write(struct pblk *pblk,
return len;
}

+static ssize_t pblk_sysfs_set_write_amp_trip(struct pblk *pblk,
+ const char *page, size_t len)
+{
+ size_t c_len;
+ int reset_value;
+
+ c_len = strcspn(page, "\n");
+ if (c_len >= len)
+ return -EINVAL;
+
+ if (kstrtouint(page, 0, &reset_value))
+ return -EINVAL;
+
+ if (reset_value != 0)
+ return -EINVAL;
+
+ pblk->user_rst_wa = atomic64_read(&pblk->user_wa);
+ pblk->pad_rst_wa = atomic64_read(&pblk->pad_wa);
+ pblk->gc_rst_wa = atomic64_read(&pblk->gc_wa);
+
+ return len;
+}
+
+
static struct attribute sys_write_luns = {
.name = "write_luns",
.mode = 0444,
@@ -410,6 +477,16 @@ static struct attribute sys_max_sec_per_write = {
.mode = 0644,
};

+static struct attribute sys_write_amp_mileage = {
+ .name = "write_amp_mileage",
+ .mode = 0444,
+};
+
+static struct attribute sys_write_amp_trip = {
+ .name = "write_amp_trip",
+ .mode = 0644,
+};
+
#ifdef CONFIG_NVM_DEBUG
static struct attribute sys_stats_debug_attr = {
.name = "stats",
@@ -428,6 +505,8 @@ static struct attribute *pblk_attrs[] = {
&sys_stats_ppaf_attr,
&sys_lines_attr,
&sys_lines_info_attr,
+ &sys_write_amp_mileage,
+ &sys_write_amp_trip,
#ifdef CONFIG_NVM_DEBUG
&sys_stats_debug_attr,
#endif
@@ -457,6 +536,10 @@ static ssize_t pblk_sysfs_show(struct kobject *kobj, struct attribute *attr,
return pblk_sysfs_lines_info(pblk, buf);
else if (strcmp(attr->name, "max_sec_per_write") == 0)
return pblk_sysfs_get_sec_per_write(pblk, buf);
+ else if (strcmp(attr->name, "write_amp_mileage") == 0)
+ return pblk_sysfs_get_write_amp_mileage(pblk, buf);
+ else if (strcmp(attr->name, "write_amp_trip") == 0)
+ return pblk_sysfs_get_write_amp_trip(pblk, buf);
#ifdef CONFIG_NVM_DEBUG
else if (strcmp(attr->name, "stats") == 0)
return pblk_sysfs_stats_debug(pblk, buf);
@@ -473,7 +556,8 @@ static ssize_t pblk_sysfs_store(struct kobject *kobj, struct attribute *attr,
return pblk_sysfs_gc_force(pblk, buf, len);
else if (strcmp(attr->name, "max_sec_per_write") == 0)
return pblk_sysfs_set_sec_per_write(pblk, buf, len);
-
+ else if (strcmp(attr->name, "write_amp_trip") == 0)
+ return pblk_sysfs_set_write_amp_trip(pblk, buf, len);
return 0;
}

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index fae2526f80b2..4b7d8618631f 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -331,7 +331,7 @@ enum {
#define SMETA_VERSION_MINOR (1)

#define EMETA_VERSION_MAJOR (0)
-#define EMETA_VERSION_MINOR (1)
+#define EMETA_VERSION_MINOR (2)

struct line_header {
__le32 crc;
@@ -361,11 +361,13 @@ struct line_smeta {
__le64 lun_bitmap[];
};

+
/*
* Metadata layout in media:
* First sector:
* 1. struct line_emeta
* 2. bad block bitmap (u64 * window_wr_lun)
+ * 3. write amplification counters
* Mid sectors (start at lbas_sector):
* 3. nr_lbas (u64) forming lba list
* Last sectors (start at vsc_sector):
@@ -389,7 +391,15 @@ struct line_emeta {
__le32 next_id; /* Line id for next line */
__le64 nr_lbas; /* Number of lbas mapped in line */
__le64 nr_valid_lbas; /* Number of valid lbas mapped in line */
- __le64 bb_bitmap[]; /* Updated bad block bitmap for line */
+ __le64 bb_bitmap[]; /* Updated bad block bitmap for line */
+};
+
+
+/* Write amplification counters stored on media */
+struct wa_counters {
+ __le64 user; /* Number of user written sectors */
+ __le64 gc; /* Number of sectors written by GC*/
+ __le64 pad; /* Number of padded sectors */
};

struct pblk_emeta {
@@ -519,10 +529,11 @@ struct pblk_line_meta {
unsigned int smeta_sec; /* Sectors needed for smeta */

unsigned int emeta_len[4]; /* Lengths for emeta:
- * [0]: Total length
- * [1]: struct line_emeta length
- * [2]: L2P portion length
- * [3]: vsc list length
+ * [0]: Total
+ * [1]: struct line_emeta +
+ * bb_bitmap + struct wa_counters
+ * [2]: L2P portion
+ * [3]: vsc
*/
unsigned int emeta_sec[4]; /* Sectors needed for emeta. Same layout
* as emeta_len
@@ -604,8 +615,19 @@ struct pblk {
int sec_per_write;

unsigned char instance_uuid[16];
+
+ /* Persistent write amplification counters, 4kb sector I/Os */
+ atomic64_t user_wa; /* Sectors written by user */
+ atomic64_t gc_wa; /* Sectors written by GC */
+ atomic64_t pad_wa; /* Padded sectors written */
+
+ /* Reset values for delta write amplification measurements */
+ u64 user_rst_wa;
+ u64 gc_rst_wa;
+ u64 pad_rst_wa;
+
#ifdef CONFIG_NVM_DEBUG
- /* All debug counters apply to 4kb sector I/Os */
+ /* Non-persistent debug counters, 4kb sector I/Os */
atomic_long_t inflight_writes; /* Inflight writes (user and gc) */
atomic_long_t padded_writes; /* Sectors padded due to flush/fua */
atomic_long_t padded_wb; /* Sectors padded in write buffer */
@@ -900,6 +922,12 @@ static inline void *emeta_to_bb(struct line_emeta *emeta)
return emeta->bb_bitmap;
}

+static inline void *emeta_to_wa(struct pblk_line_meta *lm,
+ struct line_emeta *emeta)
+{
+ return emeta->bb_bitmap + lm->blk_bitmap_len;
+}
+
static inline void *emeta_to_lbas(struct pblk *pblk, struct line_emeta *emeta)
{
return ((void *)emeta + pblk->lm.emeta_len[1]);
--
2.7.4


2018-01-31 02:08:20

by Javier González

[permalink] [raw]
Subject: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

From: Hans Holmberg <[email protected]>

When pblk receives a sync, all data up to that point in the write buffer
must be comitted to persistent storage, and as flash memory comes with a
minimal write size there is a significant cost involved both in terms
of time for completing the sync and in terms of write amplification
padded sectors for filling up to the minimal write size.

In order to get a better understanding of the costs involved for syncs,
Add a sysfs attribute to pblk: padded_dist, showing a normalized
distribution of sectors padded. In order to facilitate measurements of
specific workloads during the lifetime of the pblk instance, the
distribution can be reset by writing 0 to the attribute.

Do this by introducing counters for each possible padding:
{0..(minimal write size - 1)} and calculate the normalized distribution
when showing the attribute.

Signed-off-by: Hans Holmberg <[email protected]>
Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-init.c | 16 ++++++++--
drivers/lightnvm/pblk-rb.c | 15 +++++-----
drivers/lightnvm/pblk-sysfs.c | 68 +++++++++++++++++++++++++++++++++++++++++++
drivers/lightnvm/pblk.h | 6 +++-
4 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 7eedc5daebc8..a5e3510c0f38 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
{
pblk_luns_free(pblk);
pblk_lines_free(pblk);
+ kfree(pblk->pad_dist);
pblk_line_meta_free(pblk);
pblk_core_free(pblk);
pblk_l2p_free(pblk);
@@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
pblk->pad_rst_wa = 0;
pblk->gc_rst_wa = 0;

+ atomic_long_set(&pblk->nr_flush, 0);
+ pblk->nr_flush_rst = 0;
+
#ifdef CONFIG_NVM_DEBUG
atomic_long_set(&pblk->inflight_writes, 0);
atomic_long_set(&pblk->padded_writes, 0);
atomic_long_set(&pblk->padded_wb, 0);
- atomic_long_set(&pblk->nr_flush, 0);
atomic_long_set(&pblk->req_writes, 0);
atomic_long_set(&pblk->sub_writes, 0);
atomic_long_set(&pblk->sync_writes, 0);
@@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
goto fail_free_luns;
}

+ pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) * sizeof(atomic64_t),
+ GFP_KERNEL);
+ if (!pblk->pad_dist) {
+ ret = -ENOMEM;
+ goto fail_free_line_meta;
+ }
+
ret = pblk_core_init(pblk);
if (ret) {
pr_err("pblk: could not initialize core\n");
- goto fail_free_line_meta;
+ goto fail_free_pad_dist;
}

ret = pblk_l2p_init(pblk);
@@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
pblk_l2p_free(pblk);
fail_free_core:
pblk_core_free(pblk);
+fail_free_pad_dist:
+ kfree(pblk->pad_dist);
fail_free_line_meta:
pblk_line_meta_free(pblk);
fail_free_luns:
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 7044b5599cc4..264372d7b537 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb, unsigned int nr_entries,
if (bio->bi_opf & REQ_PREFLUSH) {
struct pblk *pblk = container_of(rb, struct pblk, rwb);

-#ifdef CONFIG_NVM_DEBUG
atomic_long_inc(&pblk->nr_flush);
-#endif
if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
*io_ret = NVM_IO_OK;
}
@@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
pr_err("pblk: could not pad page in write bio\n");
return NVM_IO_ERR;
}
+
+ if (pad < pblk->min_write_pgs)
+ atomic64_inc(&pblk->pad_dist[pad - 1]);
+ else
+ pr_warn("pblk: padding more than min. sectors\n");
+
+ atomic64_add(pad, &pblk->pad_wa);
}

- atomic64_add(pad, &((struct pblk *)
- (container_of(rb, struct pblk, rwb)))->pad_wa);
-
#ifdef CONFIG_NVM_DEBUG
- atomic_long_add(pad, &((struct pblk *)
- (container_of(rb, struct pblk, rwb)))->padded_writes);
+ atomic_long_add(pad, &pblk->padded_writes);
#endif

return NVM_IO_OK;
diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
index 4804bbd32d5f..f902ac00d071 100644
--- a/drivers/lightnvm/pblk-sysfs.c
+++ b/drivers/lightnvm/pblk-sysfs.c
@@ -341,6 +341,38 @@ static ssize_t pblk_sysfs_get_write_amp_trip(struct pblk *pblk, char *page)
atomic64_read(&pblk->pad_wa) - pblk->pad_rst_wa, page);
}

+static ssize_t pblk_sysfs_get_padding_dist(struct pblk *pblk, char *page)
+{
+ int sz = 0;
+ unsigned long long total;
+ unsigned long long total_buckets = 0;
+ int buckets = pblk->min_write_pgs - 1;
+ int i;
+
+ total = atomic64_read(&pblk->nr_flush) - pblk->nr_flush_rst;
+
+ for (i = 0; i < buckets; i++)
+ total_buckets += atomic64_read(&pblk->pad_dist[i]);
+
+ if (!total) {
+ for (i = 0; i < (buckets + 1); i++)
+ sz += snprintf(page + sz, PAGE_SIZE - sz,
+ "%d:0 ", i);
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
+
+ return sz;
+ }
+
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "0:%lld%% ",
+ ((total - total_buckets) * 100) / total);
+ for (i = 0; i < buckets; i++)
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "%d:%lld%% ", i + 1,
+ (atomic64_read(&pblk->pad_dist[i]) * 100) / total);
+ sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
+
+ return sz;
+}
+
#ifdef CONFIG_NVM_DEBUG
static ssize_t pblk_sysfs_stats_debug(struct pblk *pblk, char *page)
{
@@ -427,6 +459,32 @@ static ssize_t pblk_sysfs_set_write_amp_trip(struct pblk *pblk,
}


+static ssize_t pblk_sysfs_set_padding_dist(struct pblk *pblk,
+ const char *page, size_t len)
+{
+ size_t c_len;
+ int reset_value;
+ int buckets = pblk->min_write_pgs - 1;
+ int i;
+
+ c_len = strcspn(page, "\n");
+ if (c_len >= len)
+ return -EINVAL;
+
+ if (kstrtouint(page, 0, &reset_value))
+ return -EINVAL;
+
+ if (reset_value != 0)
+ return -EINVAL;
+
+ for (i = 0; i < buckets; i++)
+ atomic64_set(&pblk->pad_dist[i], 0);
+
+ pblk->nr_flush_rst = atomic64_read(&pblk->nr_flush);
+
+ return len;
+}
+
static struct attribute sys_write_luns = {
.name = "write_luns",
.mode = 0444,
@@ -487,6 +545,11 @@ static struct attribute sys_write_amp_trip = {
.mode = 0644,
};

+static struct attribute sys_padding_dist = {
+ .name = "padding_dist",
+ .mode = 0644,
+};
+
#ifdef CONFIG_NVM_DEBUG
static struct attribute sys_stats_debug_attr = {
.name = "stats",
@@ -507,6 +570,7 @@ static struct attribute *pblk_attrs[] = {
&sys_lines_info_attr,
&sys_write_amp_mileage,
&sys_write_amp_trip,
+ &sys_padding_dist,
#ifdef CONFIG_NVM_DEBUG
&sys_stats_debug_attr,
#endif
@@ -540,6 +604,8 @@ static ssize_t pblk_sysfs_show(struct kobject *kobj, struct attribute *attr,
return pblk_sysfs_get_write_amp_mileage(pblk, buf);
else if (strcmp(attr->name, "write_amp_trip") == 0)
return pblk_sysfs_get_write_amp_trip(pblk, buf);
+ else if (strcmp(attr->name, "padding_dist") == 0)
+ return pblk_sysfs_get_padding_dist(pblk, buf);
#ifdef CONFIG_NVM_DEBUG
else if (strcmp(attr->name, "stats") == 0)
return pblk_sysfs_stats_debug(pblk, buf);
@@ -558,6 +624,8 @@ static ssize_t pblk_sysfs_store(struct kobject *kobj, struct attribute *attr,
return pblk_sysfs_set_sec_per_write(pblk, buf, len);
else if (strcmp(attr->name, "write_amp_trip") == 0)
return pblk_sysfs_set_write_amp_trip(pblk, buf, len);
+ else if (strcmp(attr->name, "padding_dist") == 0)
+ return pblk_sysfs_set_padding_dist(pblk, buf, len);
return 0;
}

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 4b7d8618631f..88720e2441c0 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -626,12 +626,16 @@ struct pblk {
u64 gc_rst_wa;
u64 pad_rst_wa;

+ /* Counters used for calculating padding distribution */
+ atomic64_t *pad_dist; /* Padding distribution buckets */
+ u64 nr_flush_rst; /* Flushes reset value for pad dist.*/
+ atomic_long_t nr_flush; /* Number of flush/fua I/O */
+
#ifdef CONFIG_NVM_DEBUG
/* Non-persistent debug counters, 4kb sector I/Os */
atomic_long_t inflight_writes; /* Inflight writes (user and gc) */
atomic_long_t padded_writes; /* Sectors padded due to flush/fua */
atomic_long_t padded_wb; /* Sectors padded in write buffer */
- atomic_long_t nr_flush; /* Number of flush/fua I/O */
atomic_long_t req_writes; /* Sectors stored on write buffer */
atomic_long_t sub_writes; /* Sectors submitted from buffer */
atomic_long_t sync_writes; /* Sectors synced to media */
--
2.7.4


2018-01-31 02:08:38

by Javier González

[permalink] [raw]
Subject: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

In preparation for the OCSSD 2.0 spec. bad block identification,
refactor the current code to generalize bad block get/set functions and
structures.

Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-init.c | 213 +++++++++++++++++++++++--------------------
drivers/lightnvm/pblk.h | 6 --
2 files changed, 112 insertions(+), 107 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index a5e3510c0f38..86a94a7faa96 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -365,7 +365,25 @@ static void pblk_luns_free(struct pblk *pblk)
kfree(pblk->luns);
}

-static void pblk_free_line_bitmaps(struct pblk_line *line)
+static void pblk_line_mg_free(struct pblk *pblk)
+{
+ struct pblk_line_mgmt *l_mg = &pblk->l_mg;
+ int i;
+
+ kfree(l_mg->bb_template);
+ kfree(l_mg->bb_aux);
+ kfree(l_mg->vsc_list);
+
+ for (i = 0; i < PBLK_DATA_LINES; i++) {
+ kfree(l_mg->sline_meta[i]);
+ pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
+ kfree(l_mg->eline_meta[i]);
+ }
+
+ kfree(pblk->lines);
+}
+
+static void pblk_line_meta_free(struct pblk_line *line)
{
kfree(line->blk_bitmap);
kfree(line->erase_bitmap);
@@ -382,40 +400,16 @@ static void pblk_lines_free(struct pblk *pblk)
line = &pblk->lines[i];

pblk_line_free(pblk, line);
- pblk_free_line_bitmaps(line);
+ pblk_line_meta_free(line);
}
spin_unlock(&l_mg->free_lock);
}

-static void pblk_line_meta_free(struct pblk *pblk)
+static int pblk_bb_get_tbl(struct nvm_tgt_dev *dev, struct pblk_lun *rlun,
+ u8 *blks, int nr_blks)
{
- struct pblk_line_mgmt *l_mg = &pblk->l_mg;
- int i;
-
- kfree(l_mg->bb_template);
- kfree(l_mg->bb_aux);
- kfree(l_mg->vsc_list);
-
- for (i = 0; i < PBLK_DATA_LINES; i++) {
- kfree(l_mg->sline_meta[i]);
- pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
- kfree(l_mg->eline_meta[i]);
- }
-
- kfree(pblk->lines);
-}
-
-static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
-{
- struct nvm_geo *geo = &dev->geo;
struct ppa_addr ppa;
- u8 *blks;
- int nr_blks, ret;
-
- nr_blks = geo->nr_chks * geo->plane_mode;
- blks = kmalloc(nr_blks, GFP_KERNEL);
- if (!blks)
- return -ENOMEM;
+ int ret;

ppa.ppa = 0;
ppa.g.ch = rlun->bppa.g.ch;
@@ -423,34 +417,56 @@ static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)

ret = nvm_get_tgt_bb_tbl(dev, ppa, blks);
if (ret)
- goto out;
+ return ret;

nr_blks = nvm_bb_tbl_fold(dev->parent, blks, nr_blks);
- if (nr_blks < 0) {
- ret = nr_blks;
- goto out;
- }
-
- rlun->bb_list = blks;
+ if (nr_blks < 0)
+ return -EIO;

return 0;
-out:
- kfree(blks);
- return ret;
+}
+
+static void *pblk_bb_get_log(struct pblk *pblk)
+{
+ struct nvm_tgt_dev *dev = pblk->dev;
+ struct nvm_geo *geo = &dev->geo;
+ u8 *log;
+ int i, nr_blks, blk_per_lun;
+ int ret;
+
+ blk_per_lun = geo->nr_chks * geo->plane_mode;
+ nr_blks = blk_per_lun * geo->all_luns;
+
+ log = kmalloc(nr_blks, GFP_KERNEL);
+ if (!log)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < geo->all_luns; i++) {
+ struct pblk_lun *rlun = &pblk->luns[i];
+ u8 *log_pos = log + i * blk_per_lun;
+
+ ret = pblk_bb_get_tbl(dev, rlun, log_pos, blk_per_lun);
+ if (ret) {
+ kfree(log);
+ return ERR_PTR(-EIO);
+ }
+ }
+
+ return log;
}

static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
- int blk_per_line)
+ u8 *bb_log, int blk_per_line)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
- struct pblk_lun *rlun;
- int bb_cnt = 0;
- int i;
+ int i, bb_cnt = 0;

for (i = 0; i < blk_per_line; i++) {
- rlun = &pblk->luns[i];
- if (rlun->bb_list[line->id] == NVM_BLK_T_FREE)
+ struct pblk_lun *rlun = &pblk->luns[i];
+ u8 *lun_bb_log = bb_log + i * blk_per_line;
+
+ if (lun_bb_log[line->id] == NVM_BLK_T_FREE)
continue;

set_bit(pblk_ppa_to_pos(geo, rlun->bppa), line->blk_bitmap);
@@ -460,29 +476,12 @@ static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
return bb_cnt;
}

-static int pblk_alloc_line_bitmaps(struct pblk *pblk, struct pblk_line *line)
-{
- struct pblk_line_meta *lm = &pblk->lm;
-
- line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
- if (!line->blk_bitmap)
- return -ENOMEM;
-
- line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
- if (!line->erase_bitmap) {
- kfree(line->blk_bitmap);
- return -ENOMEM;
- }
-
- return 0;
-}
-
static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_lun *rlun;
- int i, ret;
+ int i;

/* TODO: Implement unbalanced LUN support */
if (geo->nr_luns < 0) {
@@ -505,13 +504,6 @@ static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
rlun->bppa = luns[lunid];

sema_init(&rlun->wr_sem, 1);
-
- ret = pblk_bb_discovery(dev, rlun);
- if (ret) {
- while (--i >= 0)
- kfree(pblk->luns[i].bb_list);
- return ret;
- }
}

return 0;
@@ -689,6 +681,26 @@ static int pblk_lines_alloc_metadata(struct pblk *pblk)
return -ENOMEM;
}

+static int pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
+ void *chunk_log, long *nr_bad_blks)
+{
+ struct pblk_line_meta *lm = &pblk->lm;
+
+ line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
+ if (!line->blk_bitmap)
+ return -ENOMEM;
+
+ line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
+ if (!line->erase_bitmap) {
+ kfree(line->blk_bitmap);
+ return -ENOMEM;
+ }
+
+ *nr_bad_blks = pblk_bb_line(pblk, line, chunk_log, lm->blk_per_line);
+
+ return 0;
+}
+
static int pblk_lines_init(struct pblk *pblk)
{
struct nvm_tgt_dev *dev = pblk->dev;
@@ -696,8 +708,9 @@ static int pblk_lines_init(struct pblk *pblk)
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line *line;
+ void *chunk_log;
unsigned int smeta_len, emeta_len;
- long nr_bad_blks, nr_free_blks;
+ long nr_bad_blks = 0, nr_free_blks = 0;
int bb_distance, max_write_ppas, mod;
int i, ret;

@@ -771,13 +784,12 @@ static int pblk_lines_init(struct pblk *pblk)
if (lm->min_blk_line > lm->blk_per_line) {
pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
lm->blk_per_line);
- ret = -EINVAL;
- goto fail;
+ return -EINVAL;
}

ret = pblk_lines_alloc_metadata(pblk);
if (ret)
- goto fail;
+ return ret;

l_mg->bb_template = kzalloc(lm->sec_bitmap_len, GFP_KERNEL);
if (!l_mg->bb_template) {
@@ -821,9 +833,16 @@ static int pblk_lines_init(struct pblk *pblk)
goto fail_free_bb_aux;
}

- nr_free_blks = 0;
+ chunk_log = pblk_bb_get_log(pblk);
+ if (IS_ERR(chunk_log)) {
+ pr_err("pblk: could not get bad block log (%lu)\n",
+ PTR_ERR(chunk_log));
+ ret = PTR_ERR(chunk_log);
+ goto fail_free_lines;
+ }
+
for (i = 0; i < l_mg->nr_lines; i++) {
- int blk_in_line;
+ int chk_in_line;

line = &pblk->lines[i];

@@ -835,26 +854,20 @@ static int pblk_lines_init(struct pblk *pblk)
line->vsc = &l_mg->vsc_list[i];
spin_lock_init(&line->lock);

- ret = pblk_alloc_line_bitmaps(pblk, line);
+ ret = pblk_setup_line_meta(pblk, line, chunk_log, &nr_bad_blks);
if (ret)
- goto fail_free_lines;
+ goto fail_free_chunk_log;

- nr_bad_blks = pblk_bb_line(pblk, line, lm->blk_per_line);
- if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line) {
- pblk_free_line_bitmaps(line);
- ret = -EINVAL;
- goto fail_free_lines;
- }
-
- blk_in_line = lm->blk_per_line - nr_bad_blks;
- if (blk_in_line < lm->min_blk_line) {
+ chk_in_line = lm->blk_per_line - nr_bad_blks;
+ if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line ||
+ chk_in_line < lm->min_blk_line) {
line->state = PBLK_LINESTATE_BAD;
list_add_tail(&line->list, &l_mg->bad_list);
continue;
}

- nr_free_blks += blk_in_line;
- atomic_set(&line->blk_in_line, blk_in_line);
+ nr_free_blks += chk_in_line;
+ atomic_set(&line->blk_in_line, chk_in_line);

l_mg->nr_free_lines++;
list_add_tail(&line->list, &l_mg->free_list);
@@ -862,23 +875,21 @@ static int pblk_lines_init(struct pblk *pblk)

pblk_set_provision(pblk, nr_free_blks);

- /* Cleanup per-LUN bad block lists - managed within lines on run-time */
- for (i = 0; i < geo->all_luns; i++)
- kfree(pblk->luns[i].bb_list);
-
+ kfree(chunk_log);
return 0;
-fail_free_lines:
+
+fail_free_chunk_log:
+ kfree(chunk_log);
while (--i >= 0)
- pblk_free_line_bitmaps(&pblk->lines[i]);
+ pblk_line_meta_free(&pblk->lines[i]);
+fail_free_lines:
+ kfree(pblk->lines);
fail_free_bb_aux:
kfree(l_mg->bb_aux);
fail_free_bb_template:
kfree(l_mg->bb_template);
fail_free_meta:
- pblk_line_meta_free(pblk);
-fail:
- for (i = 0; i < geo->all_luns; i++)
- kfree(pblk->luns[i].bb_list);
+ pblk_line_mg_free(pblk);

return ret;
}
@@ -922,7 +933,7 @@ static void pblk_free(struct pblk *pblk)
pblk_luns_free(pblk);
pblk_lines_free(pblk);
kfree(pblk->pad_dist);
- pblk_line_meta_free(pblk);
+ pblk_line_mg_free(pblk);
pblk_core_free(pblk);
pblk_l2p_free(pblk);

@@ -1110,7 +1121,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
fail_free_pad_dist:
kfree(pblk->pad_dist);
fail_free_line_meta:
- pblk_line_meta_free(pblk);
+ pblk_line_mg_free(pblk);
fail_free_luns:
pblk_luns_free(pblk);
fail:
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 88720e2441c0..282dfc8780e8 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -201,12 +201,6 @@ struct pblk_rb {

struct pblk_lun {
struct ppa_addr bppa;
-
- u8 *bb_list; /* Bad block list for LUN. Only used on
- * bring up. Bad blocks are managed
- * within lines on run-time.
- */
-
struct semaphore wr_sem;
};

--
2.7.4


2018-01-31 02:09:32

by Javier González

[permalink] [raw]
Subject: [PATCH 2/5] lightnvm: pblk: check data lines version on recovery

From: Hans Holmberg <[email protected]>

As a preparation for future bumps of data line persistent storage
versions, we need to start checking the emeta line version during
recovery. Also slit up the current emeta/smeta version into two
bytes (major,minor).

Recovering lines with the same major number as the current pblk data
line version must succeed. This means that any changes in the
persistent format must be:

(1) Backward compatible: if we switch back to and older
kernel, recovery of lines stored with major == current_major
and minor > current_minor must succeed.

(2) Forward compatible: switching to a newer kernel,
recovery of lines stored with major=current_major and
minor < minor must handle the data format differences
gracefully(i.e. initialize new data structures to default values).

If we detect lines that have a different major number than
the current we must abort recovery. The user must manually
migrate the data in this case.

Previously the version stored in the emeta header was copied
from smeta, which has version 1, so we need to set the minor
version to 1.

Signed-off-by: Hans Holmberg <[email protected]>
Signed-off-by: Javier González <[email protected]>
---
drivers/lightnvm/pblk-core.c | 9 ++++++++-
drivers/lightnvm/pblk-recovery.c | 26 ++++++++++++++++++++++++--
drivers/lightnvm/pblk.h | 16 ++++++++++++++--
3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 9027cf2ed1d8..155e42a26293 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -975,7 +975,8 @@ static int pblk_line_init_metadata(struct pblk *pblk, struct pblk_line *line,
memcpy(smeta_buf->header.uuid, pblk->instance_uuid, 16);
smeta_buf->header.id = cpu_to_le32(line->id);
smeta_buf->header.type = cpu_to_le16(line->type);
- smeta_buf->header.version = SMETA_VERSION;
+ smeta_buf->header.version_major = SMETA_VERSION_MAJOR;
+ smeta_buf->header.version_minor = SMETA_VERSION_MINOR;

/* Start metadata */
smeta_buf->seq_nr = cpu_to_le64(line->seq_nr);
@@ -998,6 +999,12 @@ static int pblk_line_init_metadata(struct pblk *pblk, struct pblk_line *line,
/* End metadata */
memcpy(&emeta_buf->header, &smeta_buf->header,
sizeof(struct line_header));
+
+ emeta_buf->header.version_major = EMETA_VERSION_MAJOR;
+ emeta_buf->header.version_minor = EMETA_VERSION_MINOR;
+ emeta_buf->header.crc = cpu_to_le32(
+ pblk_calc_meta_header_crc(pblk, &emeta_buf->header));
+
emeta_buf->seq_nr = cpu_to_le64(line->seq_nr);
emeta_buf->nr_lbas = cpu_to_le64(line->sec_in_line);
emeta_buf->nr_valid_lbas = cpu_to_le64(0);
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 1d5e961bf5e0..a30fe203d454 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -826,6 +826,25 @@ static u64 pblk_line_emeta_start(struct pblk *pblk, struct pblk_line *line)
return emeta_start;
}

+static int pblk_recov_check_line_version(struct pblk *pblk,
+ struct line_emeta *emeta)
+{
+ struct line_header *header = &emeta->header;
+
+ if (header->version_major != EMETA_VERSION_MAJOR) {
+ pr_err("pblk: line major version mismatch: %d, expected: %d\n",
+ header->version_major, EMETA_VERSION_MAJOR);
+ return 1;
+ }
+
+#ifdef NVM_DEBUG
+ if (header->version_minor > EMETA_VERSION_MINOR)
+ pr_info("pblk: newer line minor version found: %d\n", line_v);
+#endif
+
+ return 0;
+}
+
struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
{
struct pblk_line_meta *lm = &pblk->lm;
@@ -873,9 +892,9 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
if (le32_to_cpu(smeta_buf->header.identifier) != PBLK_MAGIC)
continue;

- if (smeta_buf->header.version != SMETA_VERSION) {
+ if (smeta_buf->header.version_major != SMETA_VERSION_MAJOR) {
pr_err("pblk: found incompatible line version %u\n",
- le16_to_cpu(smeta_buf->header.version));
+ smeta_buf->header.version_major);
return ERR_PTR(-EINVAL);
}

@@ -943,6 +962,9 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
goto next;
}

+ if (pblk_recov_check_line_version(pblk, line->emeta->buf))
+ return ERR_PTR(-EINVAL);
+
if (pblk_recov_l2p_from_emeta(pblk, line))
pblk_recov_l2p_from_oob(pblk, line);

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 8c357fb6538e..fae2526f80b2 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -320,14 +320,26 @@ enum {
};

#define PBLK_MAGIC 0x70626c6b /*pblk*/
-#define SMETA_VERSION cpu_to_le16(1)
+
+/* emeta/smeta persistent storage format versions:
+ * Changes in major version requires offline migration.
+ * Changes in minor version are handled automatically during
+ * recovery.
+ */
+
+#define SMETA_VERSION_MAJOR (0)
+#define SMETA_VERSION_MINOR (1)
+
+#define EMETA_VERSION_MAJOR (0)
+#define EMETA_VERSION_MINOR (1)

struct line_header {
__le32 crc;
__le32 identifier; /* pblk identifier */
__u8 uuid[16]; /* instance uuid */
__le16 type; /* line type */
- __le16 version; /* type version */
+ __u8 version_major; /* version major */
+ __u8 version_minor; /* version minor */
__le32 id; /* line id for current line */
};

--
2.7.4


2018-01-31 09:00:17

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

On 01/31/2018 03:06 AM, Javier González wrote:
> From: Hans Holmberg <[email protected]>
>
> When pblk receives a sync, all data up to that point in the write buffer
> must be comitted to persistent storage, and as flash memory comes with a
> minimal write size there is a significant cost involved both in terms
> of time for completing the sync and in terms of write amplification
> padded sectors for filling up to the minimal write size.
>
> In order to get a better understanding of the costs involved for syncs,
> Add a sysfs attribute to pblk: padded_dist, showing a normalized
> distribution of sectors padded. In order to facilitate measurements of
> specific workloads during the lifetime of the pblk instance, the
> distribution can be reset by writing 0 to the attribute.
>
> Do this by introducing counters for each possible padding:
> {0..(minimal write size - 1)} and calculate the normalized distribution
> when showing the attribute.
>
> Signed-off-by: Hans Holmberg <[email protected]>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-init.c | 16 ++++++++--
> drivers/lightnvm/pblk-rb.c | 15 +++++-----
> drivers/lightnvm/pblk-sysfs.c | 68 +++++++++++++++++++++++++++++++++++++++++++
> drivers/lightnvm/pblk.h | 6 +++-
> 4 files changed, 95 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index 7eedc5daebc8..a5e3510c0f38 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
> {
> pblk_luns_free(pblk);
> pblk_lines_free(pblk);
> + kfree(pblk->pad_dist);
> pblk_line_meta_free(pblk);
> pblk_core_free(pblk);
> pblk_l2p_free(pblk);
> @@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> pblk->pad_rst_wa = 0;
> pblk->gc_rst_wa = 0;
>
> + atomic_long_set(&pblk->nr_flush, 0);
> + pblk->nr_flush_rst = 0;
> +
> #ifdef CONFIG_NVM_DEBUG
> atomic_long_set(&pblk->inflight_writes, 0);
> atomic_long_set(&pblk->padded_writes, 0);
> atomic_long_set(&pblk->padded_wb, 0);
> - atomic_long_set(&pblk->nr_flush, 0);
> atomic_long_set(&pblk->req_writes, 0);
> atomic_long_set(&pblk->sub_writes, 0);
> atomic_long_set(&pblk->sync_writes, 0);
> @@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> goto fail_free_luns;
> }
>
> + pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) * sizeof(atomic64_t),
> + GFP_KERNEL);
> + if (!pblk->pad_dist) {
> + ret = -ENOMEM;
> + goto fail_free_line_meta;
> + }
> +
> ret = pblk_core_init(pblk);
> if (ret) {
> pr_err("pblk: could not initialize core\n");
> - goto fail_free_line_meta;
> + goto fail_free_pad_dist;
> }
>
> ret = pblk_l2p_init(pblk);
> @@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> pblk_l2p_free(pblk);
> fail_free_core:
> pblk_core_free(pblk);
> +fail_free_pad_dist:
> + kfree(pblk->pad_dist);
> fail_free_line_meta:
> pblk_line_meta_free(pblk);
> fail_free_luns:
> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> index 7044b5599cc4..264372d7b537 100644
> --- a/drivers/lightnvm/pblk-rb.c
> +++ b/drivers/lightnvm/pblk-rb.c
> @@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb, unsigned int nr_entries,
> if (bio->bi_opf & REQ_PREFLUSH) {
> struct pblk *pblk = container_of(rb, struct pblk, rwb);
>
> -#ifdef CONFIG_NVM_DEBUG
> atomic_long_inc(&pblk->nr_flush);
> -#endif
> if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
> *io_ret = NVM_IO_OK;
> }
> @@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
> pr_err("pblk: could not pad page in write bio\n");
> return NVM_IO_ERR;
> }
> +
> + if (pad < pblk->min_write_pgs)
> + atomic64_inc(&pblk->pad_dist[pad - 1]);
> + else
> + pr_warn("pblk: padding more than min. sectors\n");

Curious, when would this happen? Would it be an implementation error or
something a user did wrong?

> +
> + atomic64_add(pad, &pblk->pad_wa);
> }
>
> - atomic64_add(pad, &((struct pblk *)
> - (container_of(rb, struct pblk, rwb)))->pad_wa);
> -
> #ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(pad, &((struct pblk *)
> - (container_of(rb, struct pblk, rwb)))->padded_writes);
> + atomic_long_add(pad, &pblk->padded_writes);
> #endif
>
> return NVM_IO_OK;
> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
> index 4804bbd32d5f..f902ac00d071 100644
> --- a/drivers/lightnvm/pblk-sysfs.c
> +++ b/drivers/lightnvm/pblk-sysfs.c
> @@ -341,6 +341,38 @@ static ssize_t pblk_sysfs_get_write_amp_trip(struct pblk *pblk, char *page)
> atomic64_read(&pblk->pad_wa) - pblk->pad_rst_wa, page);
> }
>
> +static ssize_t pblk_sysfs_get_padding_dist(struct pblk *pblk, char *page)
> +{
> + int sz = 0;
> + unsigned long long total;
> + unsigned long long total_buckets = 0;
> + int buckets = pblk->min_write_pgs - 1;
> + int i;
> +
> + total = atomic64_read(&pblk->nr_flush) - pblk->nr_flush_rst;
> +
> + for (i = 0; i < buckets; i++)
> + total_buckets += atomic64_read(&pblk->pad_dist[i]);
> +

total_buckets are first used later. The calculation could be moved below
the next statement.

> + if (!total) {
> + for (i = 0; i < (buckets + 1); i++)
> + sz += snprintf(page + sz, PAGE_SIZE - sz,
> + "%d:0 ", i);
> + sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
> +
> + return sz;
> + }
> +
> + sz += snprintf(page + sz, PAGE_SIZE - sz, "0:%lld%% ",
> + ((total - total_buckets) * 100) / total);
> + for (i = 0; i < buckets; i++)
> + sz += snprintf(page + sz, PAGE_SIZE - sz, "%d:%lld%% ", i + 1,
> + (atomic64_read(&pblk->pad_dist[i]) * 100) / total);
> + sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
> +
> + return sz;
> +}
> +
> #ifdef CONFIG_NVM_DEBUG
> static ssize_t pblk_sysfs_stats_debug(struct pblk *pblk, char *page)
> {
> @@ -427,6 +459,32 @@ static ssize_t pblk_sysfs_set_write_amp_trip(struct pblk *pblk,
> }
>
>
> +static ssize_t pblk_sysfs_set_padding_dist(struct pblk *pblk,
> + const char *page, size_t len)
> +{
> + size_t c_len;
> + int reset_value;
> + int buckets = pblk->min_write_pgs - 1;
> + int i;
> +
> + c_len = strcspn(page, "\n");
> + if (c_len >= len)
> + return -EINVAL;
> +
> + if (kstrtouint(page, 0, &reset_value))
> + return -EINVAL;
> +
> + if (reset_value != 0)
> + return -EINVAL;
> +
> + for (i = 0; i < buckets; i++)
> + atomic64_set(&pblk->pad_dist[i], 0);
> +
> + pblk->nr_flush_rst = atomic64_read(&pblk->nr_flush);
> +
> + return len;
> +}
> +
> static struct attribute sys_write_luns = {
> .name = "write_luns",
> .mode = 0444,
> @@ -487,6 +545,11 @@ static struct attribute sys_write_amp_trip = {
> .mode = 0644,
> };
>
> +static struct attribute sys_padding_dist = {
> + .name = "padding_dist",
> + .mode = 0644,
> +};
> +
> #ifdef CONFIG_NVM_DEBUG
> static struct attribute sys_stats_debug_attr = {
> .name = "stats",
> @@ -507,6 +570,7 @@ static struct attribute *pblk_attrs[] = {
> &sys_lines_info_attr,
> &sys_write_amp_mileage,
> &sys_write_amp_trip,
> + &sys_padding_dist,
> #ifdef CONFIG_NVM_DEBUG
> &sys_stats_debug_attr,
> #endif
> @@ -540,6 +604,8 @@ static ssize_t pblk_sysfs_show(struct kobject *kobj, struct attribute *attr,
> return pblk_sysfs_get_write_amp_mileage(pblk, buf);
> else if (strcmp(attr->name, "write_amp_trip") == 0)
> return pblk_sysfs_get_write_amp_trip(pblk, buf);
> + else if (strcmp(attr->name, "padding_dist") == 0)
> + return pblk_sysfs_get_padding_dist(pblk, buf);
> #ifdef CONFIG_NVM_DEBUG
> else if (strcmp(attr->name, "stats") == 0)
> return pblk_sysfs_stats_debug(pblk, buf);
> @@ -558,6 +624,8 @@ static ssize_t pblk_sysfs_store(struct kobject *kobj, struct attribute *attr,
> return pblk_sysfs_set_sec_per_write(pblk, buf, len);
> else if (strcmp(attr->name, "write_amp_trip") == 0)
> return pblk_sysfs_set_write_amp_trip(pblk, buf, len);
> + else if (strcmp(attr->name, "padding_dist") == 0)
> + return pblk_sysfs_set_padding_dist(pblk, buf, len);
> return 0;
> }
>
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 4b7d8618631f..88720e2441c0 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -626,12 +626,16 @@ struct pblk {
> u64 gc_rst_wa;
> u64 pad_rst_wa;
>
> + /* Counters used for calculating padding distribution */
> + atomic64_t *pad_dist; /* Padding distribution buckets */
> + u64 nr_flush_rst; /* Flushes reset value for pad dist.*/
> + atomic_long_t nr_flush; /* Number of flush/fua I/O */
> +
> #ifdef CONFIG_NVM_DEBUG
> /* Non-persistent debug counters, 4kb sector I/Os */
> atomic_long_t inflight_writes; /* Inflight writes (user and gc) */
> atomic_long_t padded_writes; /* Sectors padded due to flush/fua */
> atomic_long_t padded_wb; /* Sectors padded in write buffer */
> - atomic_long_t nr_flush; /* Number of flush/fua I/O */
> atomic_long_t req_writes; /* Sectors stored on write buffer */
> atomic_long_t sub_writes; /* Sectors submitted from buffer */
> atomic_long_t sync_writes; /* Sectors synced to media */
>

Looks good to me. Let me know if you want to send a new patch, or if I
may make the change when I pick it up.

Also, should the padding be stored in the on-disk metadata as well?

2018-01-31 09:02:37

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

On 01/31/2018 03:06 AM, Javier González wrote:
> In preparation for the OCSSD 2.0 spec. bad block identification,
> refactor the current code to generalize bad block get/set functions and
> structures.
>
> Signed-off-by: Javier González <[email protected]>
> ---
> drivers/lightnvm/pblk-init.c | 213 +++++++++++++++++++++++--------------------
> drivers/lightnvm/pblk.h | 6 --
> 2 files changed, 112 insertions(+), 107 deletions(-)
>
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index a5e3510c0f38..86a94a7faa96 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -365,7 +365,25 @@ static void pblk_luns_free(struct pblk *pblk)
> kfree(pblk->luns);
> }
>
> -static void pblk_free_line_bitmaps(struct pblk_line *line)
> +static void pblk_line_mg_free(struct pblk *pblk)
> +{
> + struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> + int i;
> +
> + kfree(l_mg->bb_template);
> + kfree(l_mg->bb_aux);
> + kfree(l_mg->vsc_list);
> +
> + for (i = 0; i < PBLK_DATA_LINES; i++) {
> + kfree(l_mg->sline_meta[i]);
> + pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
> + kfree(l_mg->eline_meta[i]);
> + }
> +
> + kfree(pblk->lines);
> +}
> +
> +static void pblk_line_meta_free(struct pblk_line *line)
> {
> kfree(line->blk_bitmap);
> kfree(line->erase_bitmap);
> @@ -382,40 +400,16 @@ static void pblk_lines_free(struct pblk *pblk)
> line = &pblk->lines[i];
>
> pblk_line_free(pblk, line);
> - pblk_free_line_bitmaps(line);
> + pblk_line_meta_free(line);
> }
> spin_unlock(&l_mg->free_lock);
> }
>
> -static void pblk_line_meta_free(struct pblk *pblk)
> +static int pblk_bb_get_tbl(struct nvm_tgt_dev *dev, struct pblk_lun *rlun,
> + u8 *blks, int nr_blks)
> {
> - struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> - int i;
> -
> - kfree(l_mg->bb_template);
> - kfree(l_mg->bb_aux);
> - kfree(l_mg->vsc_list);
> -
> - for (i = 0; i < PBLK_DATA_LINES; i++) {
> - kfree(l_mg->sline_meta[i]);
> - pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
> - kfree(l_mg->eline_meta[i]);
> - }
> -
> - kfree(pblk->lines);
> -}
> -
> -static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
> -{
> - struct nvm_geo *geo = &dev->geo;
> struct ppa_addr ppa;
> - u8 *blks;
> - int nr_blks, ret;
> -
> - nr_blks = geo->nr_chks * geo->plane_mode;
> - blks = kmalloc(nr_blks, GFP_KERNEL);
> - if (!blks)
> - return -ENOMEM;
> + int ret;
>
> ppa.ppa = 0;
> ppa.g.ch = rlun->bppa.g.ch;
> @@ -423,34 +417,56 @@ static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
>
> ret = nvm_get_tgt_bb_tbl(dev, ppa, blks);
> if (ret)
> - goto out;
> + return ret;
>
> nr_blks = nvm_bb_tbl_fold(dev->parent, blks, nr_blks);
> - if (nr_blks < 0) {
> - ret = nr_blks;
> - goto out;
> - }
> -
> - rlun->bb_list = blks;
> + if (nr_blks < 0)
> + return -EIO;
>
> return 0;
> -out:
> - kfree(blks);
> - return ret;
> +}
> +
> +static void *pblk_bb_get_log(struct pblk *pblk)
> +{
> + struct nvm_tgt_dev *dev = pblk->dev;
> + struct nvm_geo *geo = &dev->geo;
> + u8 *log;
> + int i, nr_blks, blk_per_lun;
> + int ret;
> +
> + blk_per_lun = geo->nr_chks * geo->plane_mode;
> + nr_blks = blk_per_lun * geo->all_luns;
> +
> + log = kmalloc(nr_blks, GFP_KERNEL);
> + if (!log)
> + return ERR_PTR(-ENOMEM);
> +
> + for (i = 0; i < geo->all_luns; i++) {
> + struct pblk_lun *rlun = &pblk->luns[i];
> + u8 *log_pos = log + i * blk_per_lun;
> +
> + ret = pblk_bb_get_tbl(dev, rlun, log_pos, blk_per_lun);
> + if (ret) {
> + kfree(log);
> + return ERR_PTR(-EIO);
> + }
> + }
> +
> + return log;
> }
>
> static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
> - int blk_per_line)
> + u8 *bb_log, int blk_per_line)
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> - struct pblk_lun *rlun;
> - int bb_cnt = 0;
> - int i;
> + int i, bb_cnt = 0;
>
> for (i = 0; i < blk_per_line; i++) {
> - rlun = &pblk->luns[i];
> - if (rlun->bb_list[line->id] == NVM_BLK_T_FREE)
> + struct pblk_lun *rlun = &pblk->luns[i];
> + u8 *lun_bb_log = bb_log + i * blk_per_line;
> +
> + if (lun_bb_log[line->id] == NVM_BLK_T_FREE)
> continue;
>
> set_bit(pblk_ppa_to_pos(geo, rlun->bppa), line->blk_bitmap);
> @@ -460,29 +476,12 @@ static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
> return bb_cnt;
> }
>
> -static int pblk_alloc_line_bitmaps(struct pblk *pblk, struct pblk_line *line)
> -{
> - struct pblk_line_meta *lm = &pblk->lm;
> -
> - line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
> - if (!line->blk_bitmap)
> - return -ENOMEM;
> -
> - line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
> - if (!line->erase_bitmap) {
> - kfree(line->blk_bitmap);
> - return -ENOMEM;
> - }
> -
> - return 0;
> -}
> -
> static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> struct nvm_geo *geo = &dev->geo;
> struct pblk_lun *rlun;
> - int i, ret;
> + int i;
>
> /* TODO: Implement unbalanced LUN support */
> if (geo->nr_luns < 0) {
> @@ -505,13 +504,6 @@ static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
> rlun->bppa = luns[lunid];
>
> sema_init(&rlun->wr_sem, 1);
> -
> - ret = pblk_bb_discovery(dev, rlun);
> - if (ret) {
> - while (--i >= 0)
> - kfree(pblk->luns[i].bb_list);
> - return ret;
> - }
> }
>
> return 0;
> @@ -689,6 +681,26 @@ static int pblk_lines_alloc_metadata(struct pblk *pblk)
> return -ENOMEM;
> }
>
> +static int pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
> + void *chunk_log, long *nr_bad_blks)
> +{
> + struct pblk_line_meta *lm = &pblk->lm;
> +
> + line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
> + if (!line->blk_bitmap)
> + return -ENOMEM;
> +
> + line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
> + if (!line->erase_bitmap) {
> + kfree(line->blk_bitmap);
> + return -ENOMEM;
> + }
> +
> + *nr_bad_blks = pblk_bb_line(pblk, line, chunk_log, lm->blk_per_line);
> +
> + return 0;
> +}
> +
> static int pblk_lines_init(struct pblk *pblk)
> {
> struct nvm_tgt_dev *dev = pblk->dev;
> @@ -696,8 +708,9 @@ static int pblk_lines_init(struct pblk *pblk)
> struct pblk_line_mgmt *l_mg = &pblk->l_mg;
> struct pblk_line_meta *lm = &pblk->lm;
> struct pblk_line *line;
> + void *chunk_log;
> unsigned int smeta_len, emeta_len;
> - long nr_bad_blks, nr_free_blks;
> + long nr_bad_blks = 0, nr_free_blks = 0;
> int bb_distance, max_write_ppas, mod;
> int i, ret;
>
> @@ -771,13 +784,12 @@ static int pblk_lines_init(struct pblk *pblk)
> if (lm->min_blk_line > lm->blk_per_line) {
> pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
> lm->blk_per_line);
> - ret = -EINVAL;
> - goto fail;
> + return -EINVAL;
> }
>
> ret = pblk_lines_alloc_metadata(pblk);
> if (ret)
> - goto fail;
> + return ret;
>
> l_mg->bb_template = kzalloc(lm->sec_bitmap_len, GFP_KERNEL);
> if (!l_mg->bb_template) {
> @@ -821,9 +833,16 @@ static int pblk_lines_init(struct pblk *pblk)
> goto fail_free_bb_aux;
> }
>
> - nr_free_blks = 0;
> + chunk_log = pblk_bb_get_log(pblk);
> + if (IS_ERR(chunk_log)) {
> + pr_err("pblk: could not get bad block log (%lu)\n",
> + PTR_ERR(chunk_log));
> + ret = PTR_ERR(chunk_log);
> + goto fail_free_lines;
> + }
> +
> for (i = 0; i < l_mg->nr_lines; i++) {
> - int blk_in_line;
> + int chk_in_line;
>
> line = &pblk->lines[i];
>
> @@ -835,26 +854,20 @@ static int pblk_lines_init(struct pblk *pblk)
> line->vsc = &l_mg->vsc_list[i];
> spin_lock_init(&line->lock);
>
> - ret = pblk_alloc_line_bitmaps(pblk, line);
> + ret = pblk_setup_line_meta(pblk, line, chunk_log, &nr_bad_blks);
> if (ret)
> - goto fail_free_lines;
> + goto fail_free_chunk_log;
>
> - nr_bad_blks = pblk_bb_line(pblk, line, lm->blk_per_line);
> - if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line) {
> - pblk_free_line_bitmaps(line);
> - ret = -EINVAL;
> - goto fail_free_lines;
> - }
> -
> - blk_in_line = lm->blk_per_line - nr_bad_blks;
> - if (blk_in_line < lm->min_blk_line) {
> + chk_in_line = lm->blk_per_line - nr_bad_blks;
> + if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line ||
> + chk_in_line < lm->min_blk_line) {
> line->state = PBLK_LINESTATE_BAD;
> list_add_tail(&line->list, &l_mg->bad_list);
> continue;
> }
>
> - nr_free_blks += blk_in_line;
> - atomic_set(&line->blk_in_line, blk_in_line);
> + nr_free_blks += chk_in_line;
> + atomic_set(&line->blk_in_line, chk_in_line);
>
> l_mg->nr_free_lines++;
> list_add_tail(&line->list, &l_mg->free_list);
> @@ -862,23 +875,21 @@ static int pblk_lines_init(struct pblk *pblk)
>
> pblk_set_provision(pblk, nr_free_blks);
>
> - /* Cleanup per-LUN bad block lists - managed within lines on run-time */
> - for (i = 0; i < geo->all_luns; i++)
> - kfree(pblk->luns[i].bb_list);
> -
> + kfree(chunk_log);
> return 0;
> -fail_free_lines:
> +
> +fail_free_chunk_log:
> + kfree(chunk_log);
> while (--i >= 0)
> - pblk_free_line_bitmaps(&pblk->lines[i]);
> + pblk_line_meta_free(&pblk->lines[i]);
> +fail_free_lines:
> + kfree(pblk->lines);
> fail_free_bb_aux:
> kfree(l_mg->bb_aux);
> fail_free_bb_template:
> kfree(l_mg->bb_template);
> fail_free_meta:
> - pblk_line_meta_free(pblk);
> -fail:
> - for (i = 0; i < geo->all_luns; i++)
> - kfree(pblk->luns[i].bb_list);
> + pblk_line_mg_free(pblk);
>
> return ret;
> }
> @@ -922,7 +933,7 @@ static void pblk_free(struct pblk *pblk)
> pblk_luns_free(pblk);
> pblk_lines_free(pblk);
> kfree(pblk->pad_dist);
> - pblk_line_meta_free(pblk);
> + pblk_line_mg_free(pblk);
> pblk_core_free(pblk);
> pblk_l2p_free(pblk);
>
> @@ -1110,7 +1121,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
> fail_free_pad_dist:
> kfree(pblk->pad_dist);
> fail_free_line_meta:
> - pblk_line_meta_free(pblk);
> + pblk_line_mg_free(pblk);
> fail_free_luns:
> pblk_luns_free(pblk);
> fail:
> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> index 88720e2441c0..282dfc8780e8 100644
> --- a/drivers/lightnvm/pblk.h
> +++ b/drivers/lightnvm/pblk.h
> @@ -201,12 +201,6 @@ struct pblk_rb {
>
> struct pblk_lun {
> struct ppa_addr bppa;
> -
> - u8 *bb_list; /* Bad block list for LUN. Only used on
> - * bring up. Bad blocks are managed
> - * within lines on run-time.
> - */
> -
> struct semaphore wr_sem;
> };
>
>

Looks good to me.

With respect to the 2.0 implementation. I am thinking that get_bb and
set_bb should behave in the following way:

For get_bb (in nvme/host/lightnvm.c)
1.2: Keep the implementation as is.
2.0: Use the report chunk command, and redo the get_bb format.

For set_bb (in nvme/host/lightnvm.c)
1.2: Business as usual
2.0: Report error.

For 2.0, there will be added a function pointer for get report chunk
implementation, where the client can ask for full chunk information.
Similarly here, client will fail the function call if the drive is 1.2.

I hope to post the small 2.0 identify implementation Tomorrow or Friday,
and then you can post the report chunk implementation that you have done
if you like.

We also need to take a look at the new error codes, which does not align
with the 1.2 specification (now that they actually follow the nvme
specification ;))

2018-01-31 09:39:12

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification



> On 31 Jan 2018, at 16.51, Matias Bjørling <[email protected]> wrote:
>
>> On 01/31/2018 03:06 AM, Javier González wrote:
>> In preparation for the OCSSD 2.0 spec. bad block identification,
>> refactor the current code to generalize bad block get/set functions and
>> structures.
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-init.c | 213 +++++++++++++++++++++++--------------------
>> drivers/lightnvm/pblk.h | 6 --
>> 2 files changed, 112 insertions(+), 107 deletions(-)
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index a5e3510c0f38..86a94a7faa96 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -365,7 +365,25 @@ static void pblk_luns_free(struct pblk *pblk)
>> kfree(pblk->luns);
>> }
>> -static void pblk_free_line_bitmaps(struct pblk_line *line)
>> +static void pblk_line_mg_free(struct pblk *pblk)
>> +{
>> + struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> + int i;
>> +
>> + kfree(l_mg->bb_template);
>> + kfree(l_mg->bb_aux);
>> + kfree(l_mg->vsc_list);
>> +
>> + for (i = 0; i < PBLK_DATA_LINES; i++) {
>> + kfree(l_mg->sline_meta[i]);
>> + pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
>> + kfree(l_mg->eline_meta[i]);
>> + }
>> +
>> + kfree(pblk->lines);
>> +}
>> +
>> +static void pblk_line_meta_free(struct pblk_line *line)
>> {
>> kfree(line->blk_bitmap);
>> kfree(line->erase_bitmap);
>> @@ -382,40 +400,16 @@ static void pblk_lines_free(struct pblk *pblk)
>> line = &pblk->lines[i];
>> pblk_line_free(pblk, line);
>> - pblk_free_line_bitmaps(line);
>> + pblk_line_meta_free(line);
>> }
>> spin_unlock(&l_mg->free_lock);
>> }
>> -static void pblk_line_meta_free(struct pblk *pblk)
>> +static int pblk_bb_get_tbl(struct nvm_tgt_dev *dev, struct pblk_lun *rlun,
>> + u8 *blks, int nr_blks)
>> {
>> - struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> - int i;
>> -
>> - kfree(l_mg->bb_template);
>> - kfree(l_mg->bb_aux);
>> - kfree(l_mg->vsc_list);
>> -
>> - for (i = 0; i < PBLK_DATA_LINES; i++) {
>> - kfree(l_mg->sline_meta[i]);
>> - pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
>> - kfree(l_mg->eline_meta[i]);
>> - }
>> -
>> - kfree(pblk->lines);
>> -}
>> -
>> -static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
>> -{
>> - struct nvm_geo *geo = &dev->geo;
>> struct ppa_addr ppa;
>> - u8 *blks;
>> - int nr_blks, ret;
>> -
>> - nr_blks = geo->nr_chks * geo->plane_mode;
>> - blks = kmalloc(nr_blks, GFP_KERNEL);
>> - if (!blks)
>> - return -ENOMEM;
>> + int ret;
>> ppa.ppa = 0;
>> ppa.g.ch = rlun->bppa.g.ch;
>> @@ -423,34 +417,56 @@ static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
>> ret = nvm_get_tgt_bb_tbl(dev, ppa, blks);
>> if (ret)
>> - goto out;
>> + return ret;
>> nr_blks = nvm_bb_tbl_fold(dev->parent, blks, nr_blks);
>> - if (nr_blks < 0) {
>> - ret = nr_blks;
>> - goto out;
>> - }
>> -
>> - rlun->bb_list = blks;
>> + if (nr_blks < 0)
>> + return -EIO;
>> return 0;
>> -out:
>> - kfree(blks);
>> - return ret;
>> +}
>> +
>> +static void *pblk_bb_get_log(struct pblk *pblk)
>> +{
>> + struct nvm_tgt_dev *dev = pblk->dev;
>> + struct nvm_geo *geo = &dev->geo;
>> + u8 *log;
>> + int i, nr_blks, blk_per_lun;
>> + int ret;
>> +
>> + blk_per_lun = geo->nr_chks * geo->plane_mode;
>> + nr_blks = blk_per_lun * geo->all_luns;
>> +
>> + log = kmalloc(nr_blks, GFP_KERNEL);
>> + if (!log)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + for (i = 0; i < geo->all_luns; i++) {
>> + struct pblk_lun *rlun = &pblk->luns[i];
>> + u8 *log_pos = log + i * blk_per_lun;
>> +
>> + ret = pblk_bb_get_tbl(dev, rlun, log_pos, blk_per_lun);
>> + if (ret) {
>> + kfree(log);
>> + return ERR_PTR(-EIO);
>> + }
>> + }
>> +
>> + return log;
>> }
>> static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
>> - int blk_per_line)
>> + u8 *bb_log, int blk_per_line)
>> {
>> struct nvm_tgt_dev *dev = pblk->dev;
>> struct nvm_geo *geo = &dev->geo;
>> - struct pblk_lun *rlun;
>> - int bb_cnt = 0;
>> - int i;
>> + int i, bb_cnt = 0;
>> for (i = 0; i < blk_per_line; i++) {
>> - rlun = &pblk->luns[i];
>> - if (rlun->bb_list[line->id] == NVM_BLK_T_FREE)
>> + struct pblk_lun *rlun = &pblk->luns[i];
>> + u8 *lun_bb_log = bb_log + i * blk_per_line;
>> +
>> + if (lun_bb_log[line->id] == NVM_BLK_T_FREE)
>> continue;
>> set_bit(pblk_ppa_to_pos(geo, rlun->bppa), line->blk_bitmap);
>> @@ -460,29 +476,12 @@ static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
>> return bb_cnt;
>> }
>> -static int pblk_alloc_line_bitmaps(struct pblk *pblk, struct pblk_line *line)
>> -{
>> - struct pblk_line_meta *lm = &pblk->lm;
>> -
>> - line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>> - if (!line->blk_bitmap)
>> - return -ENOMEM;
>> -
>> - line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>> - if (!line->erase_bitmap) {
>> - kfree(line->blk_bitmap);
>> - return -ENOMEM;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
>> {
>> struct nvm_tgt_dev *dev = pblk->dev;
>> struct nvm_geo *geo = &dev->geo;
>> struct pblk_lun *rlun;
>> - int i, ret;
>> + int i;
>> /* TODO: Implement unbalanced LUN support */
>> if (geo->nr_luns < 0) {
>> @@ -505,13 +504,6 @@ static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
>> rlun->bppa = luns[lunid];
>> sema_init(&rlun->wr_sem, 1);
>> -
>> - ret = pblk_bb_discovery(dev, rlun);
>> - if (ret) {
>> - while (--i >= 0)
>> - kfree(pblk->luns[i].bb_list);
>> - return ret;
>> - }
>> }
>> return 0;
>> @@ -689,6 +681,26 @@ static int pblk_lines_alloc_metadata(struct pblk *pblk)
>> return -ENOMEM;
>> }
>> +static int pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
>> + void *chunk_log, long *nr_bad_blks)
>> +{
>> + struct pblk_line_meta *lm = &pblk->lm;
>> +
>> + line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>> + if (!line->blk_bitmap)
>> + return -ENOMEM;
>> +
>> + line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>> + if (!line->erase_bitmap) {
>> + kfree(line->blk_bitmap);
>> + return -ENOMEM;
>> + }
>> +
>> + *nr_bad_blks = pblk_bb_line(pblk, line, chunk_log, lm->blk_per_line);
>> +
>> + return 0;
>> +}
>> +
>> static int pblk_lines_init(struct pblk *pblk)
>> {
>> struct nvm_tgt_dev *dev = pblk->dev;
>> @@ -696,8 +708,9 @@ static int pblk_lines_init(struct pblk *pblk)
>> struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>> struct pblk_line_meta *lm = &pblk->lm;
>> struct pblk_line *line;
>> + void *chunk_log;
>> unsigned int smeta_len, emeta_len;
>> - long nr_bad_blks, nr_free_blks;
>> + long nr_bad_blks = 0, nr_free_blks = 0;
>> int bb_distance, max_write_ppas, mod;
>> int i, ret;
>> @@ -771,13 +784,12 @@ static int pblk_lines_init(struct pblk *pblk)
>> if (lm->min_blk_line > lm->blk_per_line) {
>> pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
>> lm->blk_per_line);
>> - ret = -EINVAL;
>> - goto fail;
>> + return -EINVAL;
>> }
>> ret = pblk_lines_alloc_metadata(pblk);
>> if (ret)
>> - goto fail;
>> + return ret;
>> l_mg->bb_template = kzalloc(lm->sec_bitmap_len, GFP_KERNEL);
>> if (!l_mg->bb_template) {
>> @@ -821,9 +833,16 @@ static int pblk_lines_init(struct pblk *pblk)
>> goto fail_free_bb_aux;
>> }
>> - nr_free_blks = 0;
>> + chunk_log = pblk_bb_get_log(pblk);
>> + if (IS_ERR(chunk_log)) {
>> + pr_err("pblk: could not get bad block log (%lu)\n",
>> + PTR_ERR(chunk_log));
>> + ret = PTR_ERR(chunk_log);
>> + goto fail_free_lines;
>> + }
>> +
>> for (i = 0; i < l_mg->nr_lines; i++) {
>> - int blk_in_line;
>> + int chk_in_line;
>> line = &pblk->lines[i];
>> @@ -835,26 +854,20 @@ static int pblk_lines_init(struct pblk *pblk)
>> line->vsc = &l_mg->vsc_list[i];
>> spin_lock_init(&line->lock);
>> - ret = pblk_alloc_line_bitmaps(pblk, line);
>> + ret = pblk_setup_line_meta(pblk, line, chunk_log, &nr_bad_blks);
>> if (ret)
>> - goto fail_free_lines;
>> + goto fail_free_chunk_log;
>> - nr_bad_blks = pblk_bb_line(pblk, line, lm->blk_per_line);
>> - if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line) {
>> - pblk_free_line_bitmaps(line);
>> - ret = -EINVAL;
>> - goto fail_free_lines;
>> - }
>> -
>> - blk_in_line = lm->blk_per_line - nr_bad_blks;
>> - if (blk_in_line < lm->min_blk_line) {
>> + chk_in_line = lm->blk_per_line - nr_bad_blks;
>> + if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line ||
>> + chk_in_line < lm->min_blk_line) {
>> line->state = PBLK_LINESTATE_BAD;
>> list_add_tail(&line->list, &l_mg->bad_list);
>> continue;
>> }
>> - nr_free_blks += blk_in_line;
>> - atomic_set(&line->blk_in_line, blk_in_line);
>> + nr_free_blks += chk_in_line;
>> + atomic_set(&line->blk_in_line, chk_in_line);
>> l_mg->nr_free_lines++;
>> list_add_tail(&line->list, &l_mg->free_list);
>> @@ -862,23 +875,21 @@ static int pblk_lines_init(struct pblk *pblk)
>> pblk_set_provision(pblk, nr_free_blks);
>> - /* Cleanup per-LUN bad block lists - managed within lines on run-time */
>> - for (i = 0; i < geo->all_luns; i++)
>> - kfree(pblk->luns[i].bb_list);
>> -
>> + kfree(chunk_log);
>> return 0;
>> -fail_free_lines:
>> +
>> +fail_free_chunk_log:
>> + kfree(chunk_log);
>> while (--i >= 0)
>> - pblk_free_line_bitmaps(&pblk->lines[i]);
>> + pblk_line_meta_free(&pblk->lines[i]);
>> +fail_free_lines:
>> + kfree(pblk->lines);
>> fail_free_bb_aux:
>> kfree(l_mg->bb_aux);
>> fail_free_bb_template:
>> kfree(l_mg->bb_template);
>> fail_free_meta:
>> - pblk_line_meta_free(pblk);
>> -fail:
>> - for (i = 0; i < geo->all_luns; i++)
>> - kfree(pblk->luns[i].bb_list);
>> + pblk_line_mg_free(pblk);
>> return ret;
>> }
>> @@ -922,7 +933,7 @@ static void pblk_free(struct pblk *pblk)
>> pblk_luns_free(pblk);
>> pblk_lines_free(pblk);
>> kfree(pblk->pad_dist);
>> - pblk_line_meta_free(pblk);
>> + pblk_line_mg_free(pblk);
>> pblk_core_free(pblk);
>> pblk_l2p_free(pblk);
>> @@ -1110,7 +1121,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>> fail_free_pad_dist:
>> kfree(pblk->pad_dist);
>> fail_free_line_meta:
>> - pblk_line_meta_free(pblk);
>> + pblk_line_mg_free(pblk);
>> fail_free_luns:
>> pblk_luns_free(pblk);
>> fail:
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 88720e2441c0..282dfc8780e8 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -201,12 +201,6 @@ struct pblk_rb {
>> struct pblk_lun {
>> struct ppa_addr bppa;
>> -
>> - u8 *bb_list; /* Bad block list for LUN. Only used on
>> - * bring up. Bad blocks are managed
>> - * within lines on run-time.
>> - */
>> -
>> struct semaphore wr_sem;
>> };
>>
>
> Looks good to me.
>
> With respect to the 2.0 implementation. I am thinking that get_bb and set_bb should behave in the following way:
>
> For get_bb (in nvme/host/lightnvm.c)
> 1.2: Keep the implementation as is.
> 2.0: Use the report chunk command, and redo the get_bb format.
>
> For set_bb (in nvme/host/lightnvm.c)
> 1.2: Business as usual
> 2.0: Report error.

I have a patches abstracting this, which I think it makes it cleaner. I can push it next week for review. I’m traveling this week. (If you want to get a glimpse I can point you to the code).

>
> For 2.0, there will be added a function pointer for get report chunk implementation, where the client can ask for full chunk information. Similarly here, client will fail the function call if the drive is 1.2.
>
> I hope to post the small 2.0 identify implementation Tomorrow or Friday, and then you can post the report chunk implementation that you have done if you like

Sure. I have patches implementing 2.0 too, but you can push your version. One thing I’d like to get in is a generic geometry structure that simplifies the identify command and puts the logic in the identify path.

This makes the base group structure from 1.2 go away makes the 1.2 and 2.0 paths easier to maintain at the target level.

I can point you to the patches tomorrow if you want to align with it. Otherwise, I can just rebase and send them next week. But please, consider this before making a new abstraction to make the 2 specs coexist - most of the work is done already...


> We also need to take a look at the new error codes, which does not align with the 1.2 specification (now that they actually follow the nvme specification ;))

Cool. Let’s maintain the 1.2 errors for compatibility and make the 2.0 path clean - that is nvme compatible ;)

Javier.

2018-01-31 18:28:03

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>
>
>> On 31 Jan 2018, at 16.51, Matias Bjørling <[email protected]> wrote:
>>
>>> On 01/31/2018 03:06 AM, Javier González wrote:
>>> In preparation for the OCSSD 2.0 spec. bad block identification,
>>> refactor the current code to generalize bad block get/set functions and
>>> structures.
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 213 +++++++++++++++++++++++--------------------
>>> drivers/lightnvm/pblk.h | 6 --
>>> 2 files changed, 112 insertions(+), 107 deletions(-)
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index a5e3510c0f38..86a94a7faa96 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -365,7 +365,25 @@ static void pblk_luns_free(struct pblk *pblk)
>>> kfree(pblk->luns);
>>> }
>>> -static void pblk_free_line_bitmaps(struct pblk_line *line)
>>> +static void pblk_line_mg_free(struct pblk *pblk)
>>> +{
>>> + struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> + int i;
>>> +
>>> + kfree(l_mg->bb_template);
>>> + kfree(l_mg->bb_aux);
>>> + kfree(l_mg->vsc_list);
>>> +
>>> + for (i = 0; i < PBLK_DATA_LINES; i++) {
>>> + kfree(l_mg->sline_meta[i]);
>>> + pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
>>> + kfree(l_mg->eline_meta[i]);
>>> + }
>>> +
>>> + kfree(pblk->lines);
>>> +}
>>> +
>>> +static void pblk_line_meta_free(struct pblk_line *line)
>>> {
>>> kfree(line->blk_bitmap);
>>> kfree(line->erase_bitmap);
>>> @@ -382,40 +400,16 @@ static void pblk_lines_free(struct pblk *pblk)
>>> line = &pblk->lines[i];
>>> pblk_line_free(pblk, line);
>>> - pblk_free_line_bitmaps(line);
>>> + pblk_line_meta_free(line);
>>> }
>>> spin_unlock(&l_mg->free_lock);
>>> }
>>> -static void pblk_line_meta_free(struct pblk *pblk)
>>> +static int pblk_bb_get_tbl(struct nvm_tgt_dev *dev, struct pblk_lun *rlun,
>>> + u8 *blks, int nr_blks)
>>> {
>>> - struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> - int i;
>>> -
>>> - kfree(l_mg->bb_template);
>>> - kfree(l_mg->bb_aux);
>>> - kfree(l_mg->vsc_list);
>>> -
>>> - for (i = 0; i < PBLK_DATA_LINES; i++) {
>>> - kfree(l_mg->sline_meta[i]);
>>> - pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
>>> - kfree(l_mg->eline_meta[i]);
>>> - }
>>> -
>>> - kfree(pblk->lines);
>>> -}
>>> -
>>> -static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
>>> -{
>>> - struct nvm_geo *geo = &dev->geo;
>>> struct ppa_addr ppa;
>>> - u8 *blks;
>>> - int nr_blks, ret;
>>> -
>>> - nr_blks = geo->nr_chks * geo->plane_mode;
>>> - blks = kmalloc(nr_blks, GFP_KERNEL);
>>> - if (!blks)
>>> - return -ENOMEM;
>>> + int ret;
>>> ppa.ppa = 0;
>>> ppa.g.ch = rlun->bppa.g.ch;
>>> @@ -423,34 +417,56 @@ static int pblk_bb_discovery(struct nvm_tgt_dev *dev, struct pblk_lun *rlun)
>>> ret = nvm_get_tgt_bb_tbl(dev, ppa, blks);
>>> if (ret)
>>> - goto out;
>>> + return ret;
>>> nr_blks = nvm_bb_tbl_fold(dev->parent, blks, nr_blks);
>>> - if (nr_blks < 0) {
>>> - ret = nr_blks;
>>> - goto out;
>>> - }
>>> -
>>> - rlun->bb_list = blks;
>>> + if (nr_blks < 0)
>>> + return -EIO;
>>> return 0;
>>> -out:
>>> - kfree(blks);
>>> - return ret;
>>> +}
>>> +
>>> +static void *pblk_bb_get_log(struct pblk *pblk)
>>> +{
>>> + struct nvm_tgt_dev *dev = pblk->dev;
>>> + struct nvm_geo *geo = &dev->geo;
>>> + u8 *log;
>>> + int i, nr_blks, blk_per_lun;
>>> + int ret;
>>> +
>>> + blk_per_lun = geo->nr_chks * geo->plane_mode;
>>> + nr_blks = blk_per_lun * geo->all_luns;
>>> +
>>> + log = kmalloc(nr_blks, GFP_KERNEL);
>>> + if (!log)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + for (i = 0; i < geo->all_luns; i++) {
>>> + struct pblk_lun *rlun = &pblk->luns[i];
>>> + u8 *log_pos = log + i * blk_per_lun;
>>> +
>>> + ret = pblk_bb_get_tbl(dev, rlun, log_pos, blk_per_lun);
>>> + if (ret) {
>>> + kfree(log);
>>> + return ERR_PTR(-EIO);
>>> + }
>>> + }
>>> +
>>> + return log;
>>> }
>>> static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
>>> - int blk_per_line)
>>> + u8 *bb_log, int blk_per_line)
>>> {
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> struct nvm_geo *geo = &dev->geo;
>>> - struct pblk_lun *rlun;
>>> - int bb_cnt = 0;
>>> - int i;
>>> + int i, bb_cnt = 0;
>>> for (i = 0; i < blk_per_line; i++) {
>>> - rlun = &pblk->luns[i];
>>> - if (rlun->bb_list[line->id] == NVM_BLK_T_FREE)
>>> + struct pblk_lun *rlun = &pblk->luns[i];
>>> + u8 *lun_bb_log = bb_log + i * blk_per_line;
>>> +
>>> + if (lun_bb_log[line->id] == NVM_BLK_T_FREE)
>>> continue;
>>> set_bit(pblk_ppa_to_pos(geo, rlun->bppa), line->blk_bitmap);
>>> @@ -460,29 +476,12 @@ static int pblk_bb_line(struct pblk *pblk, struct pblk_line *line,
>>> return bb_cnt;
>>> }
>>> -static int pblk_alloc_line_bitmaps(struct pblk *pblk, struct pblk_line *line)
>>> -{
>>> - struct pblk_line_meta *lm = &pblk->lm;
>>> -
>>> - line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>>> - if (!line->blk_bitmap)
>>> - return -ENOMEM;
>>> -
>>> - line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>>> - if (!line->erase_bitmap) {
>>> - kfree(line->blk_bitmap);
>>> - return -ENOMEM;
>>> - }
>>> -
>>> - return 0;
>>> -}
>>> -
>>> static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
>>> {
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> struct nvm_geo *geo = &dev->geo;
>>> struct pblk_lun *rlun;
>>> - int i, ret;
>>> + int i;
>>> /* TODO: Implement unbalanced LUN support */
>>> if (geo->nr_luns < 0) {
>>> @@ -505,13 +504,6 @@ static int pblk_luns_init(struct pblk *pblk, struct ppa_addr *luns)
>>> rlun->bppa = luns[lunid];
>>> sema_init(&rlun->wr_sem, 1);
>>> -
>>> - ret = pblk_bb_discovery(dev, rlun);
>>> - if (ret) {
>>> - while (--i >= 0)
>>> - kfree(pblk->luns[i].bb_list);
>>> - return ret;
>>> - }
>>> }
>>> return 0;
>>> @@ -689,6 +681,26 @@ static int pblk_lines_alloc_metadata(struct pblk *pblk)
>>> return -ENOMEM;
>>> }
>>> +static int pblk_setup_line_meta(struct pblk *pblk, struct pblk_line *line,
>>> + void *chunk_log, long *nr_bad_blks)
>>> +{
>>> + struct pblk_line_meta *lm = &pblk->lm;
>>> +
>>> + line->blk_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>>> + if (!line->blk_bitmap)
>>> + return -ENOMEM;
>>> +
>>> + line->erase_bitmap = kzalloc(lm->blk_bitmap_len, GFP_KERNEL);
>>> + if (!line->erase_bitmap) {
>>> + kfree(line->blk_bitmap);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + *nr_bad_blks = pblk_bb_line(pblk, line, chunk_log, lm->blk_per_line);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> static int pblk_lines_init(struct pblk *pblk)
>>> {
>>> struct nvm_tgt_dev *dev = pblk->dev;
>>> @@ -696,8 +708,9 @@ static int pblk_lines_init(struct pblk *pblk)
>>> struct pblk_line_mgmt *l_mg = &pblk->l_mg;
>>> struct pblk_line_meta *lm = &pblk->lm;
>>> struct pblk_line *line;
>>> + void *chunk_log;
>>> unsigned int smeta_len, emeta_len;
>>> - long nr_bad_blks, nr_free_blks;
>>> + long nr_bad_blks = 0, nr_free_blks = 0;
>>> int bb_distance, max_write_ppas, mod;
>>> int i, ret;
>>> @@ -771,13 +784,12 @@ static int pblk_lines_init(struct pblk *pblk)
>>> if (lm->min_blk_line > lm->blk_per_line) {
>>> pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
>>> lm->blk_per_line);
>>> - ret = -EINVAL;
>>> - goto fail;
>>> + return -EINVAL;
>>> }
>>> ret = pblk_lines_alloc_metadata(pblk);
>>> if (ret)
>>> - goto fail;
>>> + return ret;
>>> l_mg->bb_template = kzalloc(lm->sec_bitmap_len, GFP_KERNEL);
>>> if (!l_mg->bb_template) {
>>> @@ -821,9 +833,16 @@ static int pblk_lines_init(struct pblk *pblk)
>>> goto fail_free_bb_aux;
>>> }
>>> - nr_free_blks = 0;
>>> + chunk_log = pblk_bb_get_log(pblk);
>>> + if (IS_ERR(chunk_log)) {
>>> + pr_err("pblk: could not get bad block log (%lu)\n",
>>> + PTR_ERR(chunk_log));
>>> + ret = PTR_ERR(chunk_log);
>>> + goto fail_free_lines;
>>> + }
>>> +
>>> for (i = 0; i < l_mg->nr_lines; i++) {
>>> - int blk_in_line;
>>> + int chk_in_line;
>>> line = &pblk->lines[i];
>>> @@ -835,26 +854,20 @@ static int pblk_lines_init(struct pblk *pblk)
>>> line->vsc = &l_mg->vsc_list[i];
>>> spin_lock_init(&line->lock);
>>> - ret = pblk_alloc_line_bitmaps(pblk, line);
>>> + ret = pblk_setup_line_meta(pblk, line, chunk_log, &nr_bad_blks);
>>> if (ret)
>>> - goto fail_free_lines;
>>> + goto fail_free_chunk_log;
>>> - nr_bad_blks = pblk_bb_line(pblk, line, lm->blk_per_line);
>>> - if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line) {
>>> - pblk_free_line_bitmaps(line);
>>> - ret = -EINVAL;
>>> - goto fail_free_lines;
>>> - }
>>> -
>>> - blk_in_line = lm->blk_per_line - nr_bad_blks;
>>> - if (blk_in_line < lm->min_blk_line) {
>>> + chk_in_line = lm->blk_per_line - nr_bad_blks;
>>> + if (nr_bad_blks < 0 || nr_bad_blks > lm->blk_per_line ||
>>> + chk_in_line < lm->min_blk_line) {
>>> line->state = PBLK_LINESTATE_BAD;
>>> list_add_tail(&line->list, &l_mg->bad_list);
>>> continue;
>>> }
>>> - nr_free_blks += blk_in_line;
>>> - atomic_set(&line->blk_in_line, blk_in_line);
>>> + nr_free_blks += chk_in_line;
>>> + atomic_set(&line->blk_in_line, chk_in_line);
>>> l_mg->nr_free_lines++;
>>> list_add_tail(&line->list, &l_mg->free_list);
>>> @@ -862,23 +875,21 @@ static int pblk_lines_init(struct pblk *pblk)
>>> pblk_set_provision(pblk, nr_free_blks);
>>> - /* Cleanup per-LUN bad block lists - managed within lines on run-time */
>>> - for (i = 0; i < geo->all_luns; i++)
>>> - kfree(pblk->luns[i].bb_list);
>>> -
>>> + kfree(chunk_log);
>>> return 0;
>>> -fail_free_lines:
>>> +
>>> +fail_free_chunk_log:
>>> + kfree(chunk_log);
>>> while (--i >= 0)
>>> - pblk_free_line_bitmaps(&pblk->lines[i]);
>>> + pblk_line_meta_free(&pblk->lines[i]);
>>> +fail_free_lines:
>>> + kfree(pblk->lines);
>>> fail_free_bb_aux:
>>> kfree(l_mg->bb_aux);
>>> fail_free_bb_template:
>>> kfree(l_mg->bb_template);
>>> fail_free_meta:
>>> - pblk_line_meta_free(pblk);
>>> -fail:
>>> - for (i = 0; i < geo->all_luns; i++)
>>> - kfree(pblk->luns[i].bb_list);
>>> + pblk_line_mg_free(pblk);
>>> return ret;
>>> }
>>> @@ -922,7 +933,7 @@ static void pblk_free(struct pblk *pblk)
>>> pblk_luns_free(pblk);
>>> pblk_lines_free(pblk);
>>> kfree(pblk->pad_dist);
>>> - pblk_line_meta_free(pblk);
>>> + pblk_line_mg_free(pblk);
>>> pblk_core_free(pblk);
>>> pblk_l2p_free(pblk);
>>> @@ -1110,7 +1121,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
>>> fail_free_pad_dist:
>>> kfree(pblk->pad_dist);
>>> fail_free_line_meta:
>>> - pblk_line_meta_free(pblk);
>>> + pblk_line_mg_free(pblk);
>>> fail_free_luns:
>>> pblk_luns_free(pblk);
>>> fail:
>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>>> index 88720e2441c0..282dfc8780e8 100644
>>> --- a/drivers/lightnvm/pblk.h
>>> +++ b/drivers/lightnvm/pblk.h
>>> @@ -201,12 +201,6 @@ struct pblk_rb {
>>> struct pblk_lun {
>>> struct ppa_addr bppa;
>>> -
>>> - u8 *bb_list; /* Bad block list for LUN. Only used on
>>> - * bring up. Bad blocks are managed
>>> - * within lines on run-time.
>>> - */
>>> -
>>> struct semaphore wr_sem;
>>> };
>>>
>>
>> Looks good to me.
>>
>> With respect to the 2.0 implementation. I am thinking that get_bb and set_bb should behave in the following way:
>>
>> For get_bb (in nvme/host/lightnvm.c)
>> 1.2: Keep the implementation as is.
>> 2.0: Use the report chunk command, and redo the get_bb format.
>>
>> For set_bb (in nvme/host/lightnvm.c)
>> 1.2: Business as usual
>> 2.0: Report error.
>
> I have a patches abstracting this, which I think it makes it cleaner. I can push it next week for review. I’m traveling this week. (If you want to get a glimpse I can point you to the code).
>

Yes, please do. Thanks

>>
>> For 2.0, there will be added a function pointer for get report chunk implementation, where the client can ask for full chunk information. Similarly here, client will fail the function call if the drive is 1.2.
>>
>> I hope to post the small 2.0 identify implementation Tomorrow or Friday, and then you can post the report chunk implementation that you have done if you like
>
> Sure. I have patches implementing 2.0 too, but you can push your version. One thing I’d like to get in is a generic geometry structure that simplifies the identify command and puts the logic in the identify path.

Agree, that comes naturally when adding the support.

>
> This makes the base group structure from 1.2 go away makes the 1.2 and 2.0 paths easier to maintain at the target level.
>
> I can point you to the patches tomorrow if you want to align with it. Otherwise, I can just rebase and send them next week. But please, consider this before making a new abstraction to make the 2 specs coexist - most of the work is done already...

Thanks. Yes, they should co-exist.
>
>
>> We also need to take a look at the new error codes, which does not align with the 1.2 specification (now that they actually follow the nvme specification ;))
>
> Cool. Let’s maintain the 1.2 errors for compatibility and make the 2.0 path clean - that is nvme compatible ;)
>
> Javier.
>


2018-02-04 10:38:45

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification


> On 31 Jan 2018, at 19.24, Matias Bjørling <[email protected]> wrote:
>
> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>> On 31 Jan 2018, at 16.51, Matias Bjørling <[email protected]> wrote:
>>>
>> I have a patches abstracting this, which I think it makes it cleaner. I can push it next week for review. I’m traveling this week. (If you want to get a glimpse I can point you to the code).
>
> Yes, please do. Thanks

This is the release candidate for 2.0 support based on 4.17. I'll rebase
on top of you 2.0 support. We'll see if all changes make it to 4.17
then.

https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20

Javier


Attachments:
signature.asc (849.00 B)
Message signed with OpenPGP

2018-02-04 12:58:13

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

On 02/04/2018 11:37 AM, Javier Gonzalez wrote:
>
>> On 31 Jan 2018, at 19.24, Matias Bjørling <[email protected]> wrote:
>>
>> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>>> On 31 Jan 2018, at 16.51, Matias Bjørling <[email protected]> wrote:
>>>>
>>> I have a patches abstracting this, which I think it makes it cleaner. I can push it next week for review. I’m traveling this week. (If you want to get a glimpse I can point you to the code).
>>
>> Yes, please do. Thanks
>
> This is the release candidate for 2.0 support based on 4.17. I'll rebase
> on top of you 2.0 support. We'll see if all changes make it to 4.17
> then.
>
> https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20
>
> Javier
>

Great. I look forward to be patches being cleaned up and posted. I do
see some nitpicks here and there, which we properly can take a couple of
stabs at.

One think that generally stands out to me is the "if 1.2 support", else,
... statements. These could be structured better by having dedicated
setup functions for 1.2 and 2.0.

2018-02-04 13:09:54

by Javier Gonzalez

[permalink] [raw]
Subject: Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification


> On 4 Feb 2018, at 13.55, Matias Bjørling <[email protected]> wrote:
>
> On 02/04/2018 11:37 AM, Javier Gonzalez wrote:
>>> On 31 Jan 2018, at 19.24, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 01/31/2018 10:13 AM, Javier Gonzalez wrote:
>>>>> On 31 Jan 2018, at 16.51, Matias Bjørling <[email protected]> wrote:
>>>>>
>>>> I have a patches abstracting this, which I think it makes it cleaner. I can push it next week for review. I’m traveling this week. (If you want to get a glimpse I can point you to the code).
>>>
>>> Yes, please do. Thanks
>> This is the release candidate for 2.0 support based on 4.17. I'll rebase
>> on top of you 2.0 support. We'll see if all changes make it to 4.17
>> then.
>> https://github.com/OpenChannelSSD/linux/tree/for-4.17/spec20
>> Javier
>
> Great. I look forward to be patches being cleaned up and posted. I do see some nitpicks here and there, which we properly can take a couple of stabs at.

Sure. This is still in development; just wanted to point to the abstractions I’m thinking of so that we don’t do the same work twice.

I’ll wait for posting until you do the 2.0 identify, since the old version is implemented on the first patch of this series.

> One think that generally stands out to me is the "if 1.2 support", else, ... statements. These could be structured better by having dedicated setup functions for 1.2 and 2.0.

We have this construction both in pblk and in core for address translation. Note that we need to have them separated to support multi instance and keep channels decoupled from each instance.

I assume 2 if...then is cheaper than doing 2 de-references to function pointers. This is the way it is done on legacy paths in other places (e.g., non mq scsi), but I can look into how pointer functions would look like and measure the performance impact.

Javier

2018-02-06 09:28:23

by Hans Holmberg

[permalink] [raw]
Subject: Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

Hi Matias,

On Wed, Jan 31, 2018 at 9:44 AM, Matias Bjørling <[email protected]> wrote:
> On 01/31/2018 03:06 AM, Javier González wrote:
>>
>> From: Hans Holmberg <[email protected]>
>>
>> When pblk receives a sync, all data up to that point in the write buffer
>> must be comitted to persistent storage, and as flash memory comes with a
>> minimal write size there is a significant cost involved both in terms
>> of time for completing the sync and in terms of write amplification
>> padded sectors for filling up to the minimal write size.
>>
>> In order to get a better understanding of the costs involved for syncs,
>> Add a sysfs attribute to pblk: padded_dist, showing a normalized
>> distribution of sectors padded. In order to facilitate measurements of
>> specific workloads during the lifetime of the pblk instance, the
>> distribution can be reset by writing 0 to the attribute.
>>
>> Do this by introducing counters for each possible padding:
>> {0..(minimal write size - 1)} and calculate the normalized distribution
>> when showing the attribute.
>>
>> Signed-off-by: Hans Holmberg <[email protected]>
>> Signed-off-by: Javier González <[email protected]>
>> ---
>> drivers/lightnvm/pblk-init.c | 16 ++++++++--
>> drivers/lightnvm/pblk-rb.c | 15 +++++-----
>> drivers/lightnvm/pblk-sysfs.c | 68
>> +++++++++++++++++++++++++++++++++++++++++++
>> drivers/lightnvm/pblk.h | 6 +++-
>> 4 files changed, 95 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>> index 7eedc5daebc8..a5e3510c0f38 100644
>> --- a/drivers/lightnvm/pblk-init.c
>> +++ b/drivers/lightnvm/pblk-init.c
>> @@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
>> {
>> pblk_luns_free(pblk);
>> pblk_lines_free(pblk);
>> + kfree(pblk->pad_dist);
>> pblk_line_meta_free(pblk);
>> pblk_core_free(pblk);
>> pblk_l2p_free(pblk);
>> @@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>> struct gendisk *tdisk,
>> pblk->pad_rst_wa = 0;
>> pblk->gc_rst_wa = 0;
>> + atomic_long_set(&pblk->nr_flush, 0);
>> + pblk->nr_flush_rst = 0;
>> +
>> #ifdef CONFIG_NVM_DEBUG
>> atomic_long_set(&pblk->inflight_writes, 0);
>> atomic_long_set(&pblk->padded_writes, 0);
>> atomic_long_set(&pblk->padded_wb, 0);
>> - atomic_long_set(&pblk->nr_flush, 0);
>> atomic_long_set(&pblk->req_writes, 0);
>> atomic_long_set(&pblk->sub_writes, 0);
>> atomic_long_set(&pblk->sync_writes, 0);
>> @@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>> struct gendisk *tdisk,
>> goto fail_free_luns;
>> }
>> + pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) *
>> sizeof(atomic64_t),
>> + GFP_KERNEL);
>> + if (!pblk->pad_dist) {
>> + ret = -ENOMEM;
>> + goto fail_free_line_meta;
>> + }
>> +
>> ret = pblk_core_init(pblk);
>> if (ret) {
>> pr_err("pblk: could not initialize core\n");
>> - goto fail_free_line_meta;
>> + goto fail_free_pad_dist;
>> }
>> ret = pblk_l2p_init(pblk);
>> @@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>> struct gendisk *tdisk,
>> pblk_l2p_free(pblk);
>> fail_free_core:
>> pblk_core_free(pblk);
>> +fail_free_pad_dist:
>> + kfree(pblk->pad_dist);
>> fail_free_line_meta:
>> pblk_line_meta_free(pblk);
>> fail_free_luns:
>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>> index 7044b5599cc4..264372d7b537 100644
>> --- a/drivers/lightnvm/pblk-rb.c
>> +++ b/drivers/lightnvm/pblk-rb.c
>> @@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb,
>> unsigned int nr_entries,
>> if (bio->bi_opf & REQ_PREFLUSH) {
>> struct pblk *pblk = container_of(rb, struct pblk, rwb);
>> -#ifdef CONFIG_NVM_DEBUG
>> atomic_long_inc(&pblk->nr_flush);
>> -#endif
>> if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
>> *io_ret = NVM_IO_OK;
>> }
>> @@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb,
>> struct nvm_rq *rqd,
>> pr_err("pblk: could not pad page in write bio\n");
>> return NVM_IO_ERR;
>> }
>> +
>> + if (pad < pblk->min_write_pgs)
>> + atomic64_inc(&pblk->pad_dist[pad - 1]);
>> + else
>> + pr_warn("pblk: padding more than min. sectors\n");
>
>
> Curious, when would this happen? Would it be an implementation error or
> something a user did wrong?

This would be an implementation error - and this check is just a
safeguard to make sure we won't go out of bounds when updating the
statistics.

>
>
>> +
>> + atomic64_add(pad, &pblk->pad_wa);
>> }
>> - atomic64_add(pad, &((struct pblk *)
>> - (container_of(rb, struct pblk, rwb)))->pad_wa);
>> -
>> #ifdef CONFIG_NVM_DEBUG
>> - atomic_long_add(pad, &((struct pblk *)
>> - (container_of(rb, struct pblk,
>> rwb)))->padded_writes);
>> + atomic_long_add(pad, &pblk->padded_writes);
>> #endif
>> return NVM_IO_OK;
>> diff --git a/drivers/lightnvm/pblk-sysfs.c b/drivers/lightnvm/pblk-sysfs.c
>> index 4804bbd32d5f..f902ac00d071 100644
>> --- a/drivers/lightnvm/pblk-sysfs.c
>> +++ b/drivers/lightnvm/pblk-sysfs.c
>> @@ -341,6 +341,38 @@ static ssize_t pblk_sysfs_get_write_amp_trip(struct
>> pblk *pblk, char *page)
>> atomic64_read(&pblk->pad_wa) - pblk->pad_rst_wa, page);
>> }
>> +static ssize_t pblk_sysfs_get_padding_dist(struct pblk *pblk, char
>> *page)
>> +{
>> + int sz = 0;
>> + unsigned long long total;
>> + unsigned long long total_buckets = 0;
>> + int buckets = pblk->min_write_pgs - 1;
>> + int i;
>> +
>> + total = atomic64_read(&pblk->nr_flush) - pblk->nr_flush_rst;
>> +
>> + for (i = 0; i < buckets; i++)
>> + total_buckets += atomic64_read(&pblk->pad_dist[i]);
>> +
>
>
> total_buckets are first used later. The calculation could be moved below the
> next statement.

I saw that you fixed this on the core branch, thanks.

>
>
>> + if (!total) {
>> + for (i = 0; i < (buckets + 1); i++)
>> + sz += snprintf(page + sz, PAGE_SIZE - sz,
>> + "%d:0 ", i);
>> + sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
>> +
>> + return sz;
>> + }
>> +
>> + sz += snprintf(page + sz, PAGE_SIZE - sz, "0:%lld%% ",
>> + ((total - total_buckets) * 100) / total);
>> + for (i = 0; i < buckets; i++)
>> + sz += snprintf(page + sz, PAGE_SIZE - sz, "%d:%lld%% ", i
>> + 1,
>> + (atomic64_read(&pblk->pad_dist[i]) * 100) /
>> total);
>> + sz += snprintf(page + sz, PAGE_SIZE - sz, "\n");
>> +
>> + return sz;
>> +}
>> +
>> #ifdef CONFIG_NVM_DEBUG
>> static ssize_t pblk_sysfs_stats_debug(struct pblk *pblk, char *page)
>> {
>> @@ -427,6 +459,32 @@ static ssize_t pblk_sysfs_set_write_amp_trip(struct
>> pblk *pblk,
>> }
>> +static ssize_t pblk_sysfs_set_padding_dist(struct pblk *pblk,
>> + const char *page, size_t len)
>> +{
>> + size_t c_len;
>> + int reset_value;
>> + int buckets = pblk->min_write_pgs - 1;
>> + int i;
>> +
>> + c_len = strcspn(page, "\n");
>> + if (c_len >= len)
>> + return -EINVAL;
>> +
>> + if (kstrtouint(page, 0, &reset_value))
>> + return -EINVAL;
>> +
>> + if (reset_value != 0)
>> + return -EINVAL;
>> +
>> + for (i = 0; i < buckets; i++)
>> + atomic64_set(&pblk->pad_dist[i], 0);
>> +
>> + pblk->nr_flush_rst = atomic64_read(&pblk->nr_flush);
>> +
>> + return len;
>> +}
>> +
>> static struct attribute sys_write_luns = {
>> .name = "write_luns",
>> .mode = 0444,
>> @@ -487,6 +545,11 @@ static struct attribute sys_write_amp_trip = {
>> .mode = 0644,
>> };
>> +static struct attribute sys_padding_dist = {
>> + .name = "padding_dist",
>> + .mode = 0644,
>> +};
>> +
>> #ifdef CONFIG_NVM_DEBUG
>> static struct attribute sys_stats_debug_attr = {
>> .name = "stats",
>> @@ -507,6 +570,7 @@ static struct attribute *pblk_attrs[] = {
>> &sys_lines_info_attr,
>> &sys_write_amp_mileage,
>> &sys_write_amp_trip,
>> + &sys_padding_dist,
>> #ifdef CONFIG_NVM_DEBUG
>> &sys_stats_debug_attr,
>> #endif
>> @@ -540,6 +604,8 @@ static ssize_t pblk_sysfs_show(struct kobject *kobj,
>> struct attribute *attr,
>> return pblk_sysfs_get_write_amp_mileage(pblk, buf);
>> else if (strcmp(attr->name, "write_amp_trip") == 0)
>> return pblk_sysfs_get_write_amp_trip(pblk, buf);
>> + else if (strcmp(attr->name, "padding_dist") == 0)
>> + return pblk_sysfs_get_padding_dist(pblk, buf);
>> #ifdef CONFIG_NVM_DEBUG
>> else if (strcmp(attr->name, "stats") == 0)
>> return pblk_sysfs_stats_debug(pblk, buf);
>> @@ -558,6 +624,8 @@ static ssize_t pblk_sysfs_store(struct kobject *kobj,
>> struct attribute *attr,
>> return pblk_sysfs_set_sec_per_write(pblk, buf, len);
>> else if (strcmp(attr->name, "write_amp_trip") == 0)
>> return pblk_sysfs_set_write_amp_trip(pblk, buf, len);
>> + else if (strcmp(attr->name, "padding_dist") == 0)
>> + return pblk_sysfs_set_padding_dist(pblk, buf, len);
>> return 0;
>> }
>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
>> index 4b7d8618631f..88720e2441c0 100644
>> --- a/drivers/lightnvm/pblk.h
>> +++ b/drivers/lightnvm/pblk.h
>> @@ -626,12 +626,16 @@ struct pblk {
>> u64 gc_rst_wa;
>> u64 pad_rst_wa;
>> + /* Counters used for calculating padding distribution */
>> + atomic64_t *pad_dist; /* Padding distribution buckets */
>> + u64 nr_flush_rst; /* Flushes reset value for pad
>> dist.*/
>> + atomic_long_t nr_flush; /* Number of flush/fua I/O */
>> +
>> #ifdef CONFIG_NVM_DEBUG
>> /* Non-persistent debug counters, 4kb sector I/Os */
>> atomic_long_t inflight_writes; /* Inflight writes (user and gc)
>> */
>> atomic_long_t padded_writes; /* Sectors padded due to flush/fua
>> */
>> atomic_long_t padded_wb; /* Sectors padded in write buffer
>> */
>> - atomic_long_t nr_flush; /* Number of flush/fua I/O */
>> atomic_long_t req_writes; /* Sectors stored on write buffer
>> */
>> atomic_long_t sub_writes; /* Sectors submitted from buffer
>> */
>> atomic_long_t sync_writes; /* Sectors synced to media */
>>
>
> Looks good to me. Let me know if you want to send a new patch, or if I may
> make the change when I pick it up.

Thanks for the review, i saw that you already reworked the patch a bit
on your core branch.
However, i found a build issue when building for i386, so i'll fix
that and submit a V2(that includes your update)

> Also, should the padding be stored in the on-disk metadata as well?

I don't see the need to do this, as I believe one would only be
interested in measuring the padding distribution on specific workloads
- and this would not require persisting the counters.

Thanks, Hans

2018-02-06 10:58:33

by Matias Bjørling

[permalink] [raw]
Subject: Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

On 02/06/2018 10:27 AM, Hans Holmberg wrote:
> Hi Matias,
>
> On Wed, Jan 31, 2018 at 9:44 AM, Matias Bjørling <[email protected]> wrote:
>> On 01/31/2018 03:06 AM, Javier González wrote:
>>>
>>> From: Hans Holmberg <[email protected]>
>>>
>>> When pblk receives a sync, all data up to that point in the write buffer
>>> must be comitted to persistent storage, and as flash memory comes with a
>>> minimal write size there is a significant cost involved both in terms
>>> of time for completing the sync and in terms of write amplification
>>> padded sectors for filling up to the minimal write size.
>>>
>>> In order to get a better understanding of the costs involved for syncs,
>>> Add a sysfs attribute to pblk: padded_dist, showing a normalized
>>> distribution of sectors padded. In order to facilitate measurements of
>>> specific workloads during the lifetime of the pblk instance, the
>>> distribution can be reset by writing 0 to the attribute.
>>>
>>> Do this by introducing counters for each possible padding:
>>> {0..(minimal write size - 1)} and calculate the normalized distribution
>>> when showing the attribute.
>>>
>>> Signed-off-by: Hans Holmberg <[email protected]>
>>> Signed-off-by: Javier González <[email protected]>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 16 ++++++++--
>>> drivers/lightnvm/pblk-rb.c | 15 +++++-----
>>> drivers/lightnvm/pblk-sysfs.c | 68
>>> +++++++++++++++++++++++++++++++++++++++++++
>>> drivers/lightnvm/pblk.h | 6 +++-
>>> 4 files changed, 95 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>> index 7eedc5daebc8..a5e3510c0f38 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
>>> {
>>> pblk_luns_free(pblk);
>>> pblk_lines_free(pblk);
>>> + kfree(pblk->pad_dist);
>>> pblk_line_meta_free(pblk);
>>> pblk_core_free(pblk);
>>> pblk_l2p_free(pblk);
>>> @@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>> struct gendisk *tdisk,
>>> pblk->pad_rst_wa = 0;
>>> pblk->gc_rst_wa = 0;
>>> + atomic_long_set(&pblk->nr_flush, 0);
>>> + pblk->nr_flush_rst = 0;
>>> +
>>> #ifdef CONFIG_NVM_DEBUG
>>> atomic_long_set(&pblk->inflight_writes, 0);
>>> atomic_long_set(&pblk->padded_writes, 0);
>>> atomic_long_set(&pblk->padded_wb, 0);
>>> - atomic_long_set(&pblk->nr_flush, 0);
>>> atomic_long_set(&pblk->req_writes, 0);
>>> atomic_long_set(&pblk->sub_writes, 0);
>>> atomic_long_set(&pblk->sync_writes, 0);
>>> @@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>> struct gendisk *tdisk,
>>> goto fail_free_luns;
>>> }
>>> + pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) *
>>> sizeof(atomic64_t),
>>> + GFP_KERNEL);
>>> + if (!pblk->pad_dist) {
>>> + ret = -ENOMEM;
>>> + goto fail_free_line_meta;
>>> + }
>>> +
>>> ret = pblk_core_init(pblk);
>>> if (ret) {
>>> pr_err("pblk: could not initialize core\n");
>>> - goto fail_free_line_meta;
>>> + goto fail_free_pad_dist;
>>> }
>>> ret = pblk_l2p_init(pblk);
>>> @@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>> struct gendisk *tdisk,
>>> pblk_l2p_free(pblk);
>>> fail_free_core:
>>> pblk_core_free(pblk);
>>> +fail_free_pad_dist:
>>> + kfree(pblk->pad_dist);
>>> fail_free_line_meta:
>>> pblk_line_meta_free(pblk);
>>> fail_free_luns:
>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>> index 7044b5599cc4..264372d7b537 100644
>>> --- a/drivers/lightnvm/pblk-rb.c
>>> +++ b/drivers/lightnvm/pblk-rb.c
>>> @@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb,
>>> unsigned int nr_entries,
>>> if (bio->bi_opf & REQ_PREFLUSH) {
>>> struct pblk *pblk = container_of(rb, struct pblk, rwb);
>>> -#ifdef CONFIG_NVM_DEBUG
>>> atomic_long_inc(&pblk->nr_flush);
>>> -#endif
>>> if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
>>> *io_ret = NVM_IO_OK;
>>> }
>>> @@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb,
>>> struct nvm_rq *rqd,
>>> pr_err("pblk: could not pad page in write bio\n");
>>> return NVM_IO_ERR;
>>> }
>>> +
>>> + if (pad < pblk->min_write_pgs)
>>> + atomic64_inc(&pblk->pad_dist[pad - 1]);
>>> + else
>>> + pr_warn("pblk: padding more than min. sectors\n");
>>
>>
>> Curious, when would this happen? Would it be an implementation error or
>> something a user did wrong?
>
> This would be an implementation error - and this check is just a
> safeguard to make sure we won't go out of bounds when updating the
> statistics.
>

Ah, does it make sense to have such defensive programming, when the
value is never persisted? An 64 bit integer is quite large, and if only
used for workloads for a single instance, it would practically never
trigger, and if it did, do one care?

<snip>

>>
>> Looks good to me. Let me know if you want to send a new patch, or if I may
>> make the change when I pick it up.
>
> Thanks for the review, i saw that you already reworked the patch a bit
> on your core branch.
> However, i found a build issue when building for i386, so i'll fix
> that and submit a V2(that includes your update)

Ok. I assume this is the kbuild mail I asked about a couple of days ago.
I'll wait for v2.

2018-02-06 11:30:40

by Hans Holmberg

[permalink] [raw]
Subject: Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

On Tue, Feb 6, 2018 at 11:50 AM, Matias Bjørling <[email protected]> wrote:
> On 02/06/2018 10:27 AM, Hans Holmberg wrote:
>>
>> Hi Matias,
>>
>> On Wed, Jan 31, 2018 at 9:44 AM, Matias Bjørling <[email protected]> wrote:
>>>
>>> On 01/31/2018 03:06 AM, Javier González wrote:
>>>>
>>>>
>>>> From: Hans Holmberg <[email protected]>
>>>>
>>>> When pblk receives a sync, all data up to that point in the write buffer
>>>> must be comitted to persistent storage, and as flash memory comes with a
>>>> minimal write size there is a significant cost involved both in terms
>>>> of time for completing the sync and in terms of write amplification
>>>> padded sectors for filling up to the minimal write size.
>>>>
>>>> In order to get a better understanding of the costs involved for syncs,
>>>> Add a sysfs attribute to pblk: padded_dist, showing a normalized
>>>> distribution of sectors padded. In order to facilitate measurements of
>>>> specific workloads during the lifetime of the pblk instance, the
>>>> distribution can be reset by writing 0 to the attribute.
>>>>
>>>> Do this by introducing counters for each possible padding:
>>>> {0..(minimal write size - 1)} and calculate the normalized distribution
>>>> when showing the attribute.
>>>>
>>>> Signed-off-by: Hans Holmberg <[email protected]>
>>>> Signed-off-by: Javier González <[email protected]>
>>>> ---
>>>> drivers/lightnvm/pblk-init.c | 16 ++++++++--
>>>> drivers/lightnvm/pblk-rb.c | 15 +++++-----
>>>> drivers/lightnvm/pblk-sysfs.c | 68
>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>> drivers/lightnvm/pblk.h | 6 +++-
>>>> 4 files changed, 95 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>> index 7eedc5daebc8..a5e3510c0f38 100644
>>>> --- a/drivers/lightnvm/pblk-init.c
>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>> @@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
>>>> {
>>>> pblk_luns_free(pblk);
>>>> pblk_lines_free(pblk);
>>>> + kfree(pblk->pad_dist);
>>>> pblk_line_meta_free(pblk);
>>>> pblk_core_free(pblk);
>>>> pblk_l2p_free(pblk);
>>>> @@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> pblk->pad_rst_wa = 0;
>>>> pblk->gc_rst_wa = 0;
>>>> + atomic_long_set(&pblk->nr_flush, 0);
>>>> + pblk->nr_flush_rst = 0;
>>>> +
>>>> #ifdef CONFIG_NVM_DEBUG
>>>> atomic_long_set(&pblk->inflight_writes, 0);
>>>> atomic_long_set(&pblk->padded_writes, 0);
>>>> atomic_long_set(&pblk->padded_wb, 0);
>>>> - atomic_long_set(&pblk->nr_flush, 0);
>>>> atomic_long_set(&pblk->req_writes, 0);
>>>> atomic_long_set(&pblk->sub_writes, 0);
>>>> atomic_long_set(&pblk->sync_writes, 0);
>>>> @@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> goto fail_free_luns;
>>>> }
>>>> + pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) *
>>>> sizeof(atomic64_t),
>>>> + GFP_KERNEL);
>>>> + if (!pblk->pad_dist) {
>>>> + ret = -ENOMEM;
>>>> + goto fail_free_line_meta;
>>>> + }
>>>> +
>>>> ret = pblk_core_init(pblk);
>>>> if (ret) {
>>>> pr_err("pblk: could not initialize core\n");
>>>> - goto fail_free_line_meta;
>>>> + goto fail_free_pad_dist;
>>>> }
>>>> ret = pblk_l2p_init(pblk);
>>>> @@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> pblk_l2p_free(pblk);
>>>> fail_free_core:
>>>> pblk_core_free(pblk);
>>>> +fail_free_pad_dist:
>>>> + kfree(pblk->pad_dist);
>>>> fail_free_line_meta:
>>>> pblk_line_meta_free(pblk);
>>>> fail_free_luns:
>>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>>> index 7044b5599cc4..264372d7b537 100644
>>>> --- a/drivers/lightnvm/pblk-rb.c
>>>> +++ b/drivers/lightnvm/pblk-rb.c
>>>> @@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb
>>>> *rb,
>>>> unsigned int nr_entries,
>>>> if (bio->bi_opf & REQ_PREFLUSH) {
>>>> struct pblk *pblk = container_of(rb, struct pblk, rwb);
>>>> -#ifdef CONFIG_NVM_DEBUG
>>>> atomic_long_inc(&pblk->nr_flush);
>>>> -#endif
>>>> if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
>>>> *io_ret = NVM_IO_OK;
>>>> }
>>>> @@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb
>>>> *rb,
>>>> struct nvm_rq *rqd,
>>>> pr_err("pblk: could not pad page in write
>>>> bio\n");
>>>> return NVM_IO_ERR;
>>>> }
>>>> +
>>>> + if (pad < pblk->min_write_pgs)
>>>> + atomic64_inc(&pblk->pad_dist[pad - 1]);
>>>> + else
>>>> + pr_warn("pblk: padding more than min.
>>>> sectors\n");
>>>
>>>
>>>
>>> Curious, when would this happen? Would it be an implementation error or
>>> something a user did wrong?
>>
>>
>> This would be an implementation error - and this check is just a
>> safeguard to make sure we won't go out of bounds when updating the
>> statistics.
>>
>
> Ah, does it make sense to have such defensive programming, when the value is
> never persisted? An 64 bit integer is quite large, and if only used for
> workloads for a single instance, it would practically never trigger, and if
> it did, do one care?

Ah, sorry for being unclear, it's not for checking if the pad
counters overflow - I wanted to make sure that we won't index
pad_dist with negative numbers if we mess up the padding calculations
in the future.

>
> <snip>
>
>>>
>>> Looks good to me. Let me know if you want to send a new patch, or if I
>>> may
>>> make the change when I pick it up.
>>
>>
>> Thanks for the review, i saw that you already reworked the patch a bit
>> on your core branch.
>> However, i found a build issue when building for i386, so i'll fix
>> that and submit a V2(that includes your update)
>
>
> Ok. I assume this is the kbuild mail I asked about a couple of days ago.
> I'll wait for v2.

I did see that particular mail, but it's most likely the same issue
that the V2 will fix.

Thanks,
Hans