2017-09-06 15:01:16

by Javier González

[permalink] [raw]
Subject: [PATCH 0/6] lightnvm: pblk bug fixes for 4.14

Hi Jens,

Here are the pblk bug fixes for this window.

The patches apply on your for-4.14/block, and you can be found at:
https://github.com/OpenChannelSSD/linux/tree/pblk.for-4.14

Thanks,
Javier

Javier González (6):
lightnvm: pblk: check for failed mempool alloc.
lightnvm: pblk: initialize debug stat counter
lightnvm: pblk: use right flag for GC allocation
lightnvm: pblk: free padded entries in write buffer
lightnvm: pblk: fix write I/O sync stat
lightnvm: pblk: avoid deadlock on low LUN config

drivers/lightnvm/pblk-core.c | 5 ++++-
drivers/lightnvm/pblk-init.c | 3 ++-
drivers/lightnvm/pblk-read.c | 7 +++++--
drivers/lightnvm/pblk-rl.c | 6 ++++++
drivers/lightnvm/pblk-write.c | 16 ++++++++++++----
drivers/lightnvm/pblk.h | 2 ++
6 files changed, 31 insertions(+), 8 deletions(-)

--
2.7.4


2017-09-06 15:01:28

by Javier González

[permalink] [raw]
Subject: [PATCH 4/6] lightnvm: pblk: free padded entries in write buffer

When a REQ_FLUSH reaches pblk, the bio cannot be directly completed.
Instead, data on the write buffer is flushed and the bio is completed on
the completion pah. This might require some sectors to be padded in
order to guarantee a successful write.

This patch fixes a memory leak on the padded pages. A consequence of
this bad free was that internal bios not containing data (only a flush)
were not being completed.

Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-core.c | 1 -
drivers/lightnvm/pblk-write.c | 14 +++++++++++---
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index acb07bbcb416..84bff271abd8 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -192,7 +192,6 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,

WARN_ON(off + nr_pages != bio->bi_vcnt);

- bio_advance(bio, off * PBLK_EXPOSED_PAGE_SIZE);
for (i = off; i < nr_pages + off; i++) {
bv = bio->bi_io_vec[i];
mempool_free(bv.bv_page, pblk->page_pool);
diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 3ad9e56d2473..6acb4a92435f 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -25,13 +25,20 @@ static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
unsigned long ret;
int i;

- for (i = 0; i < c_ctx->nr_valid; i++) {
+ i = 0;
+ do {
struct pblk_w_ctx *w_ctx;

w_ctx = pblk_rb_w_ctx(&pblk->rwb, c_ctx->sentry + i);
while ((original_bio = bio_list_pop(&w_ctx->bios)))
bio_endio(original_bio);
- }
+
+ i++;
+ } while (i < c_ctx->nr_valid);
+
+ if (c_ctx->nr_padded)
+ pblk_bio_free_pages(pblk, rqd->bio, c_ctx->nr_valid,
+ c_ctx->nr_padded);

#ifdef CONFIG_NVM_DEBUG
atomic_long_add(c_ctx->nr_valid, &pblk->sync_writes);
@@ -521,7 +528,8 @@ static void pblk_free_write_rqd(struct pblk *pblk, struct nvm_rq *rqd)
struct bio *bio = rqd->bio;

if (c_ctx->nr_padded)
- pblk_bio_free_pages(pblk, bio, rqd->nr_ppas, c_ctx->nr_padded);
+ pblk_bio_free_pages(pblk, bio, c_ctx->nr_valid,
+ c_ctx->nr_padded);
}

static int pblk_submit_write(struct pblk *pblk)
--
2.7.4

2017-09-06 15:01:33

by Javier González

[permalink] [raw]
Subject: [PATCH 3/6] lightnvm: pblk: use right flag for GC allocation

The data buffer for the GC path allocates virtual memory through
vmalloc. When this change was introduced, a flag signaling kmalloc'ed
memory was wrongly introduced. Use the right flag when creating a bio
from this buffer.

Fixes: de54e703a422 ("lightnvm: pblk: use vmalloc for GC data buffer")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-read.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index d682e89e6493..ee8efb55b330 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -499,7 +499,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,

data_len = (*secs_to_gc) * geo->sec_size;
bio = pblk_bio_map_addr(pblk, data, *secs_to_gc, data_len,
- PBLK_KMALLOC_META, GFP_KERNEL);
+ PBLK_VMALLOC_META, GFP_KERNEL);
if (IS_ERR(bio)) {
pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio));
goto err_free_dma;
@@ -519,7 +519,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
if (ret) {
bio_endio(bio);
pr_err("pblk: GC read request failed\n");
- goto err_free_dma;
+ goto err_free_bio;
}

if (!wait_for_completion_io_timeout(&wait,
@@ -541,10 +541,13 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
atomic_long_sub(*secs_to_gc, &pblk->inflight_reads);
#endif

+ bio_put(bio);
out:
nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
return NVM_IO_OK;

+err_free_bio:
+ bio_put(bio);
err_free_dma:
nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
return NVM_IO_ERR;
--
2.7.4

2017-09-06 15:01:36

by Javier González

[permalink] [raw]
Subject: [PATCH 5/6] lightnvm: pblk: fix write I/O sync stat

Fix stat counter to collect the right number of I/Os being synced on the
completion path.

Fixes: 0880a9aa2d91f ("lightnvm: pblk: delete redundant buffer pointer")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-write.c b/drivers/lightnvm/pblk-write.c
index 6acb4a92435f..b48d52b2ae38 100644
--- a/drivers/lightnvm/pblk-write.c
+++ b/drivers/lightnvm/pblk-write.c
@@ -41,7 +41,7 @@ static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
c_ctx->nr_padded);

#ifdef CONFIG_NVM_DEBUG
- atomic_long_add(c_ctx->nr_valid, &pblk->sync_writes);
+ atomic_long_add(rqd->nr_ppas, &pblk->sync_writes);
#endif

ret = pblk_rb_sync_advance(&pblk->rwb, c_ctx->nr_valid);
--
2.7.4

2017-09-06 15:01:26

by Javier González

[permalink] [raw]
Subject: [PATCH 2/6] lightnvm: pblk: initialize debug stat counter

Initialize the stat counter for garbage collected reads.

Fixes: a4bd217b43268 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-init.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1b0f61233c21..8459434edd89 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -944,6 +944,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
atomic_long_set(&pblk->recov_writes, 0);
atomic_long_set(&pblk->recov_writes, 0);
atomic_long_set(&pblk->recov_gc_writes, 0);
+ atomic_long_set(&pblk->recov_gc_reads, 0);
#endif

atomic_long_set(&pblk->read_failed, 0);
--
2.7.4

2017-09-06 15:02:51

by Javier González

[permalink] [raw]
Subject: [PATCH 6/6] lightnvm: pblk: avoid deadlock on low LUN config

On low LUN configurations, make sure not to send bios that are bigger
than the buffer size.

Fixes: a4bd217b4326 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-init.c | 2 +-
drivers/lightnvm/pblk-rl.c | 6 ++++++
drivers/lightnvm/pblk.h | 2 ++
3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 8459434edd89..12c05aebac16 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -46,7 +46,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
* user I/Os. Unless stalled, the rate limiter leaves at least 256KB
* available for user I/O.
*/
- if (unlikely(pblk_get_secs(bio) >= pblk_rl_sysfs_rate_show(&pblk->rl)))
+ if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
blk_queue_split(q, &bio);

return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
index 2e6a5361baf0..596bdec433c3 100644
--- a/drivers/lightnvm/pblk-rl.c
+++ b/drivers/lightnvm/pblk-rl.c
@@ -178,6 +178,11 @@ int pblk_rl_sysfs_rate_show(struct pblk_rl *rl)
return rl->rb_user_max;
}

+int pblk_rl_max_io(struct pblk_rl *rl)
+{
+ return rl->rb_max_io;
+}
+
static void pblk_rl_u_timer(unsigned long data)
{
struct pblk_rl *rl = (struct pblk_rl *)data;
@@ -214,6 +219,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
/* To start with, all buffer is available to user I/O writers */
rl->rb_budget = budget;
rl->rb_user_max = budget;
+ rl->rb_max_io = budget >> 1;
rl->rb_gc_max = 0;
rl->rb_state = PBLK_RL_HIGH;

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 67e623bd5c2d..a2753f9da830 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -267,6 +267,7 @@ struct pblk_rl {
int rb_gc_max; /* Max buffer entries available for GC I/O */
int rb_gc_rsv; /* Reserved buffer entries for GC I/O */
int rb_state; /* Rate-limiter current state */
+ int rb_max_io; /* Maximum size for an I/O giving the config */

atomic_t rb_user_cnt; /* User I/O buffer counter */
atomic_t rb_gc_cnt; /* GC I/O buffer counter */
@@ -844,6 +845,7 @@ int pblk_rl_gc_may_insert(struct pblk_rl *rl, int nr_entries);
void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries);
void pblk_rl_out(struct pblk_rl *rl, int nr_user, int nr_gc);
int pblk_rl_sysfs_rate_show(struct pblk_rl *rl);
+int pblk_rl_max_io(struct pblk_rl *rl);
void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line);
void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line);
void pblk_rl_set_space_limit(struct pblk_rl *rl, int entries_left);
--
2.7.4

2017-09-06 15:03:20

by Javier González

[permalink] [raw]
Subject: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

Check for failed mempool allocations and act accordingly.

Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
---
drivers/lightnvm/pblk-core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 81501644fb15..acb07bbcb416 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -165,6 +165,8 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int rw)
}

rqd = mempool_alloc(pool, GFP_KERNEL);
+ if (!rqd)
+ return NULL;
memset(rqd, 0, rq_size);

return rqd;
@@ -1478,6 +1480,8 @@ int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr ppa)
int err;

rqd = mempool_alloc(pblk->g_rq_pool, GFP_KERNEL);
+ if (!rqd)
+ return -ENOMEM;
memset(rqd, 0, pblk_g_rq_size);

pblk_setup_e_rq(pblk, rqd, ppa);
--
2.7.4

2017-09-06 15:08:14

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier Gonz?lez wrote:
> Check for failed mempool allocations and act accordingly.

Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
"[...] Note that due to preallocation, this function *never* fails when called
from process contexts. (it might fail if called from an IRQ context.) [...]"


Byte,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 15:09:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>> Check for failed mempool allocations and act accordingly.
>
> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
> "[...] Note that due to preallocation, this function *never* fails when called
> from process contexts. (it might fail if called from an IRQ context.) [...]"

It's not needed, mempool() will never fail if __GFP_WAIT is set in the
mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.

--
Jens Axboe

2017-09-06 15:12:21

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

> On 6 Sep 2017, at 17.09, Jens Axboe <[email protected]> wrote:
>
> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>> Check for failed mempool allocations and act accordingly.
>>
>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>> "[...] Note that due to preallocation, this function *never* fails when called
>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>
> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.

Thanks for the clarification. Do you just drop the patch, or do you want
me to re-send the series?

I see that I do this other places, I'll clean it up for next window.

Thanks,
Javier


Attachments:
signature.asc (801.00 B)
Message signed with OpenPGP

2017-09-06 15:13:02

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

On Wed, Sep 06, 2017 at 09:09:29AM -0600, Jens Axboe wrote:
> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
> > On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier Gonz?lez wrote:
> >> Check for failed mempool allocations and act accordingly.
> >
> > Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
> > "[...] Note that due to preallocation, this function *never* fails when called
> > from process contexts. (it might fail if called from an IRQ context.) [...]"
>
> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.

Exactly. Maybe I shouldn't have it phrased as a question though...

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 15:13:53

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

On 09/06/2017 09:12 AM, Javier González wrote:
>> On 6 Sep 2017, at 17.09, Jens Axboe <[email protected]> wrote:
>>
>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>>> Check for failed mempool allocations and act accordingly.
>>>
>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>> "[...] Note that due to preallocation, this function *never* fails when called
>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>
>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>
> Thanks for the clarification. Do you just drop the patch, or do you want
> me to re-send the series?

No need to resend. I'll pick up the others in a day or two, once people
have had some time to go over them.

--
Jens Axboe

2017-09-06 15:14:59

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

> On 6 Sep 2017, at 17.13, Jens Axboe <[email protected]> wrote:
>
> On 09/06/2017 09:12 AM, Javier González wrote:
>>> On 6 Sep 2017, at 17.09, Jens Axboe <[email protected]> wrote:
>>>
>>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>>>> Check for failed mempool allocations and act accordingly.
>>>>
>>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>>> "[...] Note that due to preallocation, this function *never* fails when called
>>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>>
>>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>>
>> Thanks for the clarification. Do you just drop the patch, or do you want
>> me to re-send the series?
>
> No need to resend. I'll pick up the others in a day or two, once people
> have had some time to go over them.
>

Thanks. And apologies for the delay on the patches...

Javier


Attachments:
signature.asc (801.00 B)
Message signed with OpenPGP

2017-09-06 15:15:15

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 2/6] lightnvm: pblk: initialize debug stat counter

Looks good,
Reviewed-by: Johannes Thumshirn <[email protected]>

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 15:20:01

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 3/6] lightnvm: pblk: use right flag for GC allocation

Looks good,
Reviewed-by: Johannes Thumshirn <[email protected]>
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 15:20:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

On 09/06/2017 09:13 AM, Jens Axboe wrote:
> On 09/06/2017 09:12 AM, Javier González wrote:
>>> On 6 Sep 2017, at 17.09, Jens Axboe <[email protected]> wrote:
>>>
>>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>>>> Check for failed mempool allocations and act accordingly.
>>>>
>>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>>> "[...] Note that due to preallocation, this function *never* fails when called
>>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>>
>>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>>
>> Thanks for the clarification. Do you just drop the patch, or do you want
>> me to re-send the series?
>
> No need to resend. I'll pick up the others in a day or two, once people
> have had some time to go over them.

I took a quick look at your mempool usage, and I'm not sure it's
correct. For a mempool to work, you have to ensure that you provide a
forward progress guarantee. With that guarantee, you know that if you do
end up sleeping on allocation, you already have items inflight that will
be freed when that operation completes. In other words, all allocations
must have a defined and finite life time, as any allocation can
potentially sleep/block for that life time. You can't allocate something
and hold on to it forever, then you are violating the terms of agreement
that makes a mempool work.

The first one that caught my eye is pblk->page_pool. You have this loop:

for (i = 0; i < nr_pages; i++) {
page = mempool_alloc(pblk->page_pool, flags);
if (!page)
goto err;

ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
if (ret != PBLK_EXPOSED_PAGE_SIZE) {
pr_err("pblk: could not add page to bio\n");
mempool_free(page, pblk->page_pool);
goto err;
}
}

which looks suspect. This mempool is created with a reserve pool of
PAGE_POOL_SIZE (16) members. Do we know if the bio has 16 pages or less?
If not, then this is broken and can deadlock forever.

You have a lot of mempool usage in the code, would probably not hurt to
audit all of them.

--
Jens Axboe

2017-09-06 15:22:10

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 4/6] lightnvm: pblk: free padded entries in write buffer

On Wed, Sep 06, 2017 at 05:01:04PM +0200, Javier Gonz?lez wrote:
> the completion pah. This might require some sectors to be padded in
^ path

Looks good otherwise,
Reviewed-by: Johannes Thumshirn <[email protected]>

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 15:23:35

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 5/6] lightnvm: pblk: fix write I/O sync stat


Reviewed-by: Johannes Thumshirn <[email protected]>
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-09-06 18:29:03

by Javier González

[permalink] [raw]
Subject: Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

> On 6 Sep 2017, at 17.20, Jens Axboe <[email protected]> wrote:
>
> On 09/06/2017 09:13 AM, Jens Axboe wrote:
>> On 09/06/2017 09:12 AM, Javier González wrote:
>>>> On 6 Sep 2017, at 17.09, Jens Axboe <[email protected]> wrote:
>>>>
>>>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>>>>> Check for failed mempool allocations and act accordingly.
>>>>>
>>>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>>>> "[...] Note that due to preallocation, this function *never* fails when called
>>>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>>>
>>>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>>>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>>>
>>> Thanks for the clarification. Do you just drop the patch, or do you want
>>> me to re-send the series?
>>
>> No need to resend. I'll pick up the others in a day or two, once people
>> have had some time to go over them.
>
> I took a quick look at your mempool usage, and I'm not sure it's
> correct. For a mempool to work, you have to ensure that you provide a
> forward progress guarantee. With that guarantee, you know that if you do
> end up sleeping on allocation, you already have items inflight that will
> be freed when that operation completes. In other words, all allocations
> must have a defined and finite life time, as any allocation can
> potentially sleep/block for that life time. You can't allocate something
> and hold on to it forever, then you are violating the terms of agreement
> that makes a mempool work.

I understood the part of guaranteeing the number of inflight items to
keep the mempool active without waiting, but I must admit that I assumed
that the mempool would resize when getting pressure and that the penalty
would be increased latency, not the mempool giving up and causing a
deadlock.

>
> The first one that caught my eye is pblk->page_pool. You have this loop:
>
> for (i = 0; i < nr_pages; i++) {
> page = mempool_alloc(pblk->page_pool, flags);
> if (!page)
> goto err;
>
> ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
> if (ret != PBLK_EXPOSED_PAGE_SIZE) {
> pr_err("pblk: could not add page to bio\n");
> mempool_free(page, pblk->page_pool);
> goto err;
> }
> }
>
> which looks suspect. This mempool is created with a reserve pool of
> PAGE_POOL_SIZE (16) members. Do we know if the bio has 16 pages or less?
> If not, then this is broken and can deadlock forever.

I can see that in this case, the 16 elements do not hold. In the read
path, we can guarantee that a read will be <= 64 sectors (4KB pages), so
this is definitely wrong. I'll fix it tomorrow.

Since we are at it, I have for some time wondered what's the right way
to balance the mempool sizes so that we are a good citizen and perform
at the same time. I don't see a lot of code using mempool_resize to tune
the min_nr based on load. For example, in our write path, the numbers
are easy to calculate, but on the read path I completely
over-dimensioned the mempool to ensure not having to wait for the
completion path. Any good rule of thumb here?

> You have a lot of mempool usage in the code, would probably not hurt to
> audit all of them.

Yes. I will take a look and add comments to the sizes.

Thanks Jens,
Javier


Attachments:
signature.asc (801.00 B)
Message signed with OpenPGP