LinuxLists.cc - [PATCH] xen-blkback: fix memory leaks

2014-01-27 10:13:46

Subject: [PATCH] xen-blkback: fix memory leaks

I've at least identified two possible memory leaks in blkback, both
related to the shutdown path of a VBD:

- We don't wait for any pending purge work to finish before cleaning
the list of free_pages. The purge work will call put_free_pages and
thus we might end up with pages being added to the free_pages list
after we have emptied it.
- We don't wait for pending requests to end before cleaning persistent
grants and the list of free_pages. Again this can add pages to the
free_pages lists or persistent grants to the persistent_gnts
red-black tree.

Also, add some checks in xen_blkif_free to make sure we are cleaning
everything.

Signed-off-by: Roger Pau Monné <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Matt Rushton <[email protected]>
Cc: Matt Wilson <[email protected]>
Cc: Ian Campbell <[email protected]>
---
This should be applied after the patch:

xen-blkback: fix memory leak when persistent grants are used

>From Matt Rushton & Matt Wilson and backported to stable.

I've been able to create and destroy ~4000 guests while doing heavy IO
operations with this patch on a 512M Dom0 without problems.
---
drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
drivers/block/xen-blkback/xenbus.c | 9 +++++++++
2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 30ef7b3..19925b7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
struct pending_req *pending_req);
static void make_response(struct xen_blkif *blkif, u64 id,
unsigned short op, int st);
+static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);

#define foreach_grant_safe(pos, n, rbtree, node) \
for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
@@ -625,6 +626,12 @@ purge_gnt_list:
print_stats(blkif);
}

+ /* Drain pending IO */
+ xen_blk_drain_io(blkif, true);
+
+ /* Drain pending purge work */
+ flush_work(&blkif->persistent_purge_work);
+
/* Free all persistent grant pages */
if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
free_persistent_gnts(blkif, &blkif->persistent_gnts,
@@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
return -EIO;
}

-static void xen_blk_drain_io(struct xen_blkif *blkif)
+static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
{
atomic_set(&blkif->drain, 1);
do {
@@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)

if (!atomic_read(&blkif->drain))
break;
- } while (!kthread_should_stop());
+ } while (!kthread_should_stop() || force);
atomic_set(&blkif->drain, 0);
}

@@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
* the proper response on the ring.
*/
if (atomic_dec_and_test(&pending_req->pendcnt)) {
- xen_blkbk_unmap(pending_req->blkif,
+ struct xen_blkif *blkif = pending_req->blkif;
+
+ xen_blkbk_unmap(blkif,
pending_req->segments,
pending_req->nr_pages);
- make_response(pending_req->blkif, pending_req->id,
+ make_response(blkif, pending_req->id,
pending_req->operation, pending_req->status);
- xen_blkif_put(pending_req->blkif);
- if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
- if (atomic_read(&pending_req->blkif->drain))
- complete(&pending_req->blkif->drain_complete);
+ free_req(blkif, pending_req);
+ xen_blkif_put(blkif);
+ if (atomic_read(&blkif->refcnt) <= 2) {
+ if (atomic_read(&blkif->drain))
+ complete(&blkif->drain_complete);
}
- free_req(pending_req->blkif, pending_req);
}
}

@@ -1224,7 +1233,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
* issue the WRITE_FLUSH.
*/
if (drain)
- xen_blk_drain_io(pending_req->blkif);
+ xen_blk_drain_io(pending_req->blkif, false);

/*
* If we have failed at this point, we need to undo the M2P override,
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index c2014a0..3c10281 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -125,6 +125,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
blkif->persistent_gnts.rb_node = NULL;
spin_lock_init(&blkif->free_pages_lock);
INIT_LIST_HEAD(&blkif->free_pages);
+ INIT_LIST_HEAD(&blkif->persistent_purge_list);
blkif->free_pages_num = 0;
atomic_set(&blkif->persistent_gnt_in_use, 0);

@@ -259,6 +260,14 @@ static void xen_blkif_free(struct xen_blkif *blkif)
if (!atomic_dec_and_test(&blkif->refcnt))
BUG();

+ /* Make sure everything is drained before shutting down */
+ BUG_ON(blkif->persistent_gnt_c != 0);
+ BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
+ BUG_ON(blkif->free_pages_num != 0);
+ BUG_ON(!list_empty(&blkif->persistent_purge_list));
+ BUG_ON(!list_empty(&blkif->free_pages));
+ BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
+
/* Check that there is no request in use */
list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
list_del(&req->free_list);
--
1.7.7.5 (Apple Git-26)

2014-01-27 16:09:26

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
> I've at least identified two possible memory leaks in blkback, both
> related to the shutdown path of a VBD:
>
> - We don't wait for any pending purge work to finish before cleaning
> the list of free_pages. The purge work will call put_free_pages and
> thus we might end up with pages being added to the free_pages list
> after we have emptied it.
> - We don't wait for pending requests to end before cleaning persistent
> grants and the list of free_pages. Again this can add pages to the
> free_pages lists or persistent grants to the persistent_gnts
> red-black tree.
>
> Also, add some checks in xen_blkif_free to make sure we are cleaning
> everything.
>
> Signed-off-by: Roger Pau Monn? <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: David Vrabel <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: Matt Rushton <[email protected]>
> Cc: Matt Wilson <[email protected]>
> Cc: Ian Campbell <[email protected]>
> ---
> This should be applied after the patch:
>
> xen-blkback: fix memory leak when persistent grants are used

Could you respin the series with the issues below fixed and
have said patch as part of the series. That way not only does
it have your SoB on it but it makes it easier to apply the patch
for lazy^H^H^Hbusy maintainers and makes it clear that you had
tested both of them.

Also, please CC Jens Axboe on these patches.

Thank you!
>
> >From Matt Rushton & Matt Wilson and backported to stable.
>
> I've been able to create and destroy ~4000 guests while doing heavy IO
> operations with this patch on a 512M Dom0 without problems.
> ---
> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
> 2 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 30ef7b3..19925b7 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> struct pending_req *pending_req);
> static void make_response(struct xen_blkif *blkif, u64 id,
> unsigned short op, int st);
> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
>
> #define foreach_grant_safe(pos, n, rbtree, node) \
> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
> @@ -625,6 +626,12 @@ purge_gnt_list:
> print_stats(blkif);
> }
>
> + /* Drain pending IO */
> + xen_blk_drain_io(blkif, true);
> +
> + /* Drain pending purge work */
> + flush_work(&blkif->persistent_purge_work);
> +
> /* Free all persistent grant pages */
> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
> free_persistent_gnts(blkif, &blkif->persistent_gnts,
> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
> return -EIO;
> }
>
> -static void xen_blk_drain_io(struct xen_blkif *blkif)
> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
> {
> atomic_set(&blkif->drain, 1);
> do {
> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
>
> if (!atomic_read(&blkif->drain))
> break;
> - } while (!kthread_should_stop());
> + } while (!kthread_should_stop() || force);
> atomic_set(&blkif->drain, 0);
> }
>
> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
> * the proper response on the ring.
> */
> if (atomic_dec_and_test(&pending_req->pendcnt)) {
> - xen_blkbk_unmap(pending_req->blkif,
> + struct xen_blkif *blkif = pending_req->blkif;
> +
> + xen_blkbk_unmap(blkif,
> pending_req->segments,
> pending_req->nr_pages);
> - make_response(pending_req->blkif, pending_req->id,
> + make_response(blkif, pending_req->id,
> pending_req->operation, pending_req->status);
> - xen_blkif_put(pending_req->blkif);
> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
> - if (atomic_read(&pending_req->blkif->drain))
> - complete(&pending_req->blkif->drain_complete);
> + free_req(blkif, pending_req);
> + xen_blkif_put(blkif);
> + if (atomic_read(&blkif->refcnt) <= 2) {
> + if (atomic_read(&blkif->drain))
> + complete(&blkif->drain_complete);
> }
> - free_req(pending_req->blkif, pending_req);
> }
> }
>
> @@ -1224,7 +1233,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> * issue the WRITE_FLUSH.
> */
> if (drain)
> - xen_blk_drain_io(pending_req->blkif);
> + xen_blk_drain_io(pending_req->blkif, false);
>
> /*
> * If we have failed at this point, we need to undo the M2P override,
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index c2014a0..3c10281 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -125,6 +125,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
> blkif->persistent_gnts.rb_node = NULL;
> spin_lock_init(&blkif->free_pages_lock);
> INIT_LIST_HEAD(&blkif->free_pages);
> + INIT_LIST_HEAD(&blkif->persistent_purge_list);

Hm,
> blkif->free_pages_num = 0;
> atomic_set(&blkif->persistent_gnt_in_use, 0);
>
> @@ -259,6 +260,14 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> if (!atomic_dec_and_test(&blkif->refcnt))
> BUG();
>
> + /* Make sure everything is drained before shutting down */
> + BUG_ON(blkif->persistent_gnt_c != 0);
> + BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
> + BUG_ON(blkif->free_pages_num != 0);
> + BUG_ON(!list_empty(&blkif->persistent_purge_list));

You don't seem to put anything on this list? Or even declare this?
Was there another patch in the series?

> + BUG_ON(!list_empty(&blkif->free_pages));
> + BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
> +
> /* Check that there is no request in use */
> list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
> list_del(&req->free_list);
> --
> 1.7.7.5 (Apple Git-26)
>

2014-01-27 16:19:52

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
>> I've at least identified two possible memory leaks in blkback, both
>> related to the shutdown path of a VBD:
>>
>> - We don't wait for any pending purge work to finish before cleaning
>> the list of free_pages. The purge work will call put_free_pages and
>> thus we might end up with pages being added to the free_pages list
>> after we have emptied it.
>> - We don't wait for pending requests to end before cleaning persistent
>> grants and the list of free_pages. Again this can add pages to the
>> free_pages lists or persistent grants to the persistent_gnts
>> red-black tree.
>>
>> Also, add some checks in xen_blkif_free to make sure we are cleaning
>> everything.
>>
>> Signed-off-by: Roger Pau Monn? <[email protected]>
>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>> Cc: David Vrabel <[email protected]>
>> Cc: Boris Ostrovsky <[email protected]>
>> Cc: Matt Rushton <[email protected]>
>> Cc: Matt Wilson <[email protected]>
>> Cc: Ian Campbell <[email protected]>
>> ---
>> This should be applied after the patch:
>>
>> xen-blkback: fix memory leak when persistent grants are used
>
> Could you respin the series with the issues below fixed and
> have said patch as part of the series. That way not only does
> it have your SoB on it but it makes it easier to apply the patch
> for lazy^H^H^Hbusy maintainers and makes it clear that you had
> tested both of them.
>
> Also, please CC Jens Axboe on these patches.

Ack, will do once the comments below are sorted out.

>>
>> >From Matt Rushton & Matt Wilson and backported to stable.
>>
>> I've been able to create and destroy ~4000 guests while doing heavy IO
>> operations with this patch on a 512M Dom0 without problems.
>> ---
>> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
>> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
>> 2 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 30ef7b3..19925b7 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> struct pending_req *pending_req);
>> static void make_response(struct xen_blkif *blkif, u64 id,
>> unsigned short op, int st);
>> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
>>
>> #define foreach_grant_safe(pos, n, rbtree, node) \
>> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
>> @@ -625,6 +626,12 @@ purge_gnt_list:
>> print_stats(blkif);
>> }
>>
>> + /* Drain pending IO */
>> + xen_blk_drain_io(blkif, true);
>> +
>> + /* Drain pending purge work */
>> + flush_work(&blkif->persistent_purge_work);
>> +
>> /* Free all persistent grant pages */
>> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
>> free_persistent_gnts(blkif, &blkif->persistent_gnts,
>> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
>> return -EIO;
>> }
>>
>> -static void xen_blk_drain_io(struct xen_blkif *blkif)
>> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
>> {
>> atomic_set(&blkif->drain, 1);
>> do {
>> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
>>
>> if (!atomic_read(&blkif->drain))
>> break;
>> - } while (!kthread_should_stop());
>> + } while (!kthread_should_stop() || force);
>> atomic_set(&blkif->drain, 0);
>> }
>>
>> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>> * the proper response on the ring.
>> */
>> if (atomic_dec_and_test(&pending_req->pendcnt)) {
>> - xen_blkbk_unmap(pending_req->blkif,
>> + struct xen_blkif *blkif = pending_req->blkif;
>> +
>> + xen_blkbk_unmap(blkif,
>> pending_req->segments,
>> pending_req->nr_pages);
>> - make_response(pending_req->blkif, pending_req->id,
>> + make_response(blkif, pending_req->id,
>> pending_req->operation, pending_req->status);
>> - xen_blkif_put(pending_req->blkif);
>> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
>> - if (atomic_read(&pending_req->blkif->drain))
>> - complete(&pending_req->blkif->drain_complete);
>> + free_req(blkif, pending_req);
>> + xen_blkif_put(blkif);
>> + if (atomic_read(&blkif->refcnt) <= 2) {
>> + if (atomic_read(&blkif->drain))
>> + complete(&blkif->drain_complete);
>> }
>> - free_req(pending_req->blkif, pending_req);
>> }
>> }
>>
>> @@ -1224,7 +1233,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> * issue the WRITE_FLUSH.
>> */
>> if (drain)
>> - xen_blk_drain_io(pending_req->blkif);
>> + xen_blk_drain_io(pending_req->blkif, false);
>>
>> /*
>> * If we have failed at this point, we need to undo the M2P override,
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index c2014a0..3c10281 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -125,6 +125,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
>> blkif->persistent_gnts.rb_node = NULL;
>> spin_lock_init(&blkif->free_pages_lock);
>> INIT_LIST_HEAD(&blkif->free_pages);
>> + INIT_LIST_HEAD(&blkif->persistent_purge_list);
>
> Hm,

See comment below.

>> blkif->free_pages_num = 0;
>> atomic_set(&blkif->persistent_gnt_in_use, 0);
>>
>> @@ -259,6 +260,14 @@ static void xen_blkif_free(struct xen_blkif *blkif)
>> if (!atomic_dec_and_test(&blkif->refcnt))
>> BUG();
>>
>> + /* Make sure everything is drained before shutting down */
>> + BUG_ON(blkif->persistent_gnt_c != 0);
>> + BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
>> + BUG_ON(blkif->free_pages_num != 0);
>> + BUG_ON(!list_empty(&blkif->persistent_purge_list));
>
> You don't seem to put anything on this list? Or even declare this?
> Was there another patch in the series?

No, the list is already used in current code, but it is initialized only
before usage, now I need to make sure it's initialized even if not used, or:

BUG_ON(!list_empty(&blkif->persistent_purge_list));

Is going to fail.

I will resend this and replace the other (now useless) initialization
with a BUG_ON(!list_empty...

>
>> + BUG_ON(!list_empty(&blkif->free_pages));
>> + BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
>> +
>> /* Check that there is no request in use */
>> list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
>> list_del(&req->free_list);
>> --
>> 1.7.7.5 (Apple Git-26)
>>

2014-01-27 18:50:10

by Matt Wilson

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On Mon, Jan 27, 2014 at 05:19:41PM +0100, Roger Pau Monn? wrote:
> On 27/01/14 17:09, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
> >> I've at least identified two possible memory leaks in blkback, both
> >> related to the shutdown path of a VBD:
> >>
> >> - We don't wait for any pending purge work to finish before cleaning
> >> the list of free_pages. The purge work will call put_free_pages and
> >> thus we might end up with pages being added to the free_pages list
> >> after we have emptied it.
> >> - We don't wait for pending requests to end before cleaning persistent
> >> grants and the list of free_pages. Again this can add pages to the
> >> free_pages lists or persistent grants to the persistent_gnts
> >> red-black tree.
> >>
> >> Also, add some checks in xen_blkif_free to make sure we are cleaning
> >> everything.
> >>
> >> Signed-off-by: Roger Pau Monn? <[email protected]>
> >> Cc: Konrad Rzeszutek Wilk <[email protected]>
> >> Cc: David Vrabel <[email protected]>
> >> Cc: Boris Ostrovsky <[email protected]>
> >> Cc: Matt Rushton <[email protected]>
> >> Cc: Matt Wilson <[email protected]>
> >> Cc: Ian Campbell <[email protected]>
> >> ---
> >> This should be applied after the patch:
> >>
> >> xen-blkback: fix memory leak when persistent grants are used
> >
> > Could you respin the series with the issues below fixed and
> > have said patch as part of the series. That way not only does
> > it have your SoB on it but it makes it easier to apply the patch
> > for lazy^H^H^Hbusy maintainers and makes it clear that you had
> > tested both of them.
> >
> > Also, please CC Jens Axboe on these patches.
>
> Ack, will do once the comments below are sorted out.

We'll keep an eye out for the resubmitted series and do some testing.

Thanks for taking this the extra step.

--msw

> >> >From Matt Rushton & Matt Wilson and backported to stable.
> >>
> >> I've been able to create and destroy ~4000 guests while doing heavy IO
> >> operations with this patch on a 512M Dom0 without problems.
> >> ---
> >> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
> >> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
> >> 2 files changed, 28 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >> index 30ef7b3..19925b7 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >> struct pending_req *pending_req);
> >> static void make_response(struct xen_blkif *blkif, u64 id,
> >> unsigned short op, int st);
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
> >>
> >> #define foreach_grant_safe(pos, n, rbtree, node) \
> >> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
> >> @@ -625,6 +626,12 @@ purge_gnt_list:
> >> print_stats(blkif);
> >> }
> >>
> >> + /* Drain pending IO */
> >> + xen_blk_drain_io(blkif, true);
> >> +
> >> + /* Drain pending purge work */
> >> + flush_work(&blkif->persistent_purge_work);
> >> +
> >> /* Free all persistent grant pages */
> >> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
> >> free_persistent_gnts(blkif, &blkif->persistent_gnts,
> >> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
> >> return -EIO;
> >> }
> >>
> >> -static void xen_blk_drain_io(struct xen_blkif *blkif)
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
> >> {
> >> atomic_set(&blkif->drain, 1);
> >> do {
> >> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
> >>
> >> if (!atomic_read(&blkif->drain))
> >> break;
> >> - } while (!kthread_should_stop());
> >> + } while (!kthread_should_stop() || force);
> >> atomic_set(&blkif->drain, 0);
> >> }
> >>
> >> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
> >> * the proper response on the ring.
> >> */
> >> if (atomic_dec_and_test(&pending_req->pendcnt)) {
> >> - xen_blkbk_unmap(pending_req->blkif,
> >> + struct xen_blkif *blkif = pending_req->blkif;
> >> +
> >> + xen_blkbk_unmap(blkif,
> >> pending_req->segments,
> >> pending_req->nr_pages);
> >> - make_response(pending_req->blkif, pending_req->id,
> >> + make_response(blkif, pending_req->id,
> >> pending_req->operation, pending_req->status);
> >> - xen_blkif_put(pending_req->blkif);
> >> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
> >> - if (atomic_read(&pending_req->blkif->drain))
> >> - complete(&pending_req->blkif->drain_complete);
> >> + free_req(blkif, pending_req);
> >> + xen_blkif_put(blkif);
> >> + if (atomic_read(&blkif->refcnt) <= 2) {
> >> + if (atomic_read(&blkif->drain))
> >> + complete(&blkif->drain_complete);
> >> }
> >> - free_req(pending_req->blkif, pending_req);
> >> }
> >> }
> >>
> >> @@ -1224,7 +1233,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >> * issue the WRITE_FLUSH.
> >> */
> >> if (drain)
> >> - xen_blk_drain_io(pending_req->blkif);
> >> + xen_blk_drain_io(pending_req->blkif, false);
> >>
> >> /*
> >> * If we have failed at this point, we need to undo the M2P override,
> >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> >> index c2014a0..3c10281 100644
> >> --- a/drivers/block/xen-blkback/xenbus.c
> >> +++ b/drivers/block/xen-blkback/xenbus.c
> >> @@ -125,6 +125,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
> >> blkif->persistent_gnts.rb_node = NULL;
> >> spin_lock_init(&blkif->free_pages_lock);
> >> INIT_LIST_HEAD(&blkif->free_pages);
> >> + INIT_LIST_HEAD(&blkif->persistent_purge_list);
> >
> > Hm,
>
> See comment below.
>
> >> blkif->free_pages_num = 0;
> >> atomic_set(&blkif->persistent_gnt_in_use, 0);
> >>
> >> @@ -259,6 +260,14 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> >> if (!atomic_dec_and_test(&blkif->refcnt))
> >> BUG();
> >>
> >> + /* Make sure everything is drained before shutting down */
> >> + BUG_ON(blkif->persistent_gnt_c != 0);
> >> + BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
> >> + BUG_ON(blkif->free_pages_num != 0);
> >> + BUG_ON(!list_empty(&blkif->persistent_purge_list));
> >
> > You don't seem to put anything on this list? Or even declare this?
> > Was there another patch in the series?
>
> No, the list is already used in current code, but it is initialized only
> before usage, now I need to make sure it's initialized even if not used, or:
>
> BUG_ON(!list_empty(&blkif->persistent_purge_list));
>
> Is going to fail.
>
> I will resend this and replace the other (now useless) initialization
> with a BUG_ON(!list_empty...
>
> >
> >> + BUG_ON(!list_empty(&blkif->free_pages));
> >> + BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
> >> +
> >> /* Check that there is no request in use */
> >> list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
> >> list_del(&req->free_list);
> >>
>

2014-01-27 21:21:59

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
> I've at least identified two possible memory leaks in blkback, both
> related to the shutdown path of a VBD:
>
> - We don't wait for any pending purge work to finish before cleaning
> the list of free_pages. The purge work will call put_free_pages and
> thus we might end up with pages being added to the free_pages list
> after we have emptied it.
> - We don't wait for pending requests to end before cleaning persistent
> grants and the list of free_pages. Again this can add pages to the
> free_pages lists or persistent grants to the persistent_gnts
> red-black tree.
>
> Also, add some checks in xen_blkif_free to make sure we are cleaning
> everything.
>
> Signed-off-by: Roger Pau Monn? <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: David Vrabel <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: Matt Rushton <[email protected]>
> Cc: Matt Wilson <[email protected]>
> Cc: Ian Campbell <[email protected]>
> ---
> This should be applied after the patch:
>
> xen-blkback: fix memory leak when persistent grants are used
>
> >From Matt Rushton & Matt Wilson and backported to stable.
>
> I've been able to create and destroy ~4000 guests while doing heavy IO
> operations with this patch on a 512M Dom0 without problems.
> ---
> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
> 2 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 30ef7b3..19925b7 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> struct pending_req *pending_req);
> static void make_response(struct xen_blkif *blkif, u64 id,
> unsigned short op, int st);
> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
>
> #define foreach_grant_safe(pos, n, rbtree, node) \
> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
> @@ -625,6 +626,12 @@ purge_gnt_list:
> print_stats(blkif);
> }
>
> + /* Drain pending IO */
> + xen_blk_drain_io(blkif, true);
> +
> + /* Drain pending purge work */
> + flush_work(&blkif->persistent_purge_work);
> +

I think this means we can eliminate the refcnt usage - at least when
it comes to xen_blkif_disconnect where if we would initiate the shutdown, and
there is

239 atomic_dec(&blkif->refcnt);
240 wait_event(blkif->waiting_to_free, atomic_read(&blkif->refcnt) == 0);
241 atomic_inc(&blkif->refcnt);
242

which is done _after_ the thread is done executing. That check won't
be needed anymore as the xen_blk_drain_io, flush_work, and free_persistent_gnts
has pretty much drained every I/O out - so the moment the thread exits
there should be no need for waiting_to_free. I think.

> /* Free all persistent grant pages */
> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
> free_persistent_gnts(blkif, &blkif->persistent_gnts,
> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
> return -EIO;
> }
>
> -static void xen_blk_drain_io(struct xen_blkif *blkif)
> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
> {
> atomic_set(&blkif->drain, 1);
> do {
> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
>
> if (!atomic_read(&blkif->drain))
> break;
> - } while (!kthread_should_stop());
> + } while (!kthread_should_stop() || force);
> atomic_set(&blkif->drain, 0);
> }
>
> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
> * the proper response on the ring.
> */
> if (atomic_dec_and_test(&pending_req->pendcnt)) {
> - xen_blkbk_unmap(pending_req->blkif,
> + struct xen_blkif *blkif = pending_req->blkif;
> +
> + xen_blkbk_unmap(blkif,
> pending_req->segments,
> pending_req->nr_pages);
> - make_response(pending_req->blkif, pending_req->id,
> + make_response(blkif, pending_req->id,
> pending_req->operation, pending_req->status);
> - xen_blkif_put(pending_req->blkif);
> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
> - if (atomic_read(&pending_req->blkif->drain))
> - complete(&pending_req->blkif->drain_complete);
> + free_req(blkif, pending_req);
> + xen_blkif_put(blkif);
> + if (atomic_read(&blkif->refcnt) <= 2) {
> + if (atomic_read(&blkif->drain))
> + complete(&blkif->drain_complete);
> }
> - free_req(pending_req->blkif, pending_req);

I keep coming back to this and I am not sure what to think - especially
in the context of WRITE_BARRIER and disconnecting the vbd.

You moved the 'free_req' to be done before you do atomic_read/dec.

Which means that we do:

list_add(&req->free_list, &blkif->pending_free);
wake_up(&blkif->pending_free_wq);

atomic_dec
if atomic_read <= 2 poke thread that is waiting for drain.

while in the past we did:

atomic_dec
if atomic_read <= 2 poke thread that is waiting for drain.

list_add(&req->free_list, &blkif->pending_free);
wake_up(&blkif->pending_free_wq);

which means that we are giving the 'req' _before_ we decrement
the refcnts.

Could that mean that __do_block_io_op takes it for a spin - oh
wait it won't as it is sitting on a WRITE_BARRIER and waiting:

1226 if (drain)
1227 xen_blk_drain_io(pending_req->blkif);

But still that feels 'wrong'?

If you think this is right and OK, then perhaps we should
document this behavior in case somebody in the future wants to
rewrite this and starts working it out ?

>
> @@ -1224,7 +1233,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> * issue the WRITE_FLUSH.
> */
> if (drain)
> - xen_blk_drain_io(pending_req->blkif);
> + xen_blk_drain_io(pending_req->blkif, false);
>
> /*
> * If we have failed at this point, we need to undo the M2P override,
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index c2014a0..3c10281 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -125,6 +125,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
> blkif->persistent_gnts.rb_node = NULL;
> spin_lock_init(&blkif->free_pages_lock);
> INIT_LIST_HEAD(&blkif->free_pages);
> + INIT_LIST_HEAD(&blkif->persistent_purge_list);
> blkif->free_pages_num = 0;
> atomic_set(&blkif->persistent_gnt_in_use, 0);
>
> @@ -259,6 +260,14 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> if (!atomic_dec_and_test(&blkif->refcnt))
> BUG();
>
> + /* Make sure everything is drained before shutting down */
> + BUG_ON(blkif->persistent_gnt_c != 0);
> + BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);
> + BUG_ON(blkif->free_pages_num != 0);
> + BUG_ON(!list_empty(&blkif->persistent_purge_list));
> + BUG_ON(!list_empty(&blkif->free_pages));
> + BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
> +
> /* Check that there is no request in use */
> list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
> list_del(&req->free_list);
> --
> 1.7.7.5 (Apple Git-26)
>

2014-01-28 12:44:43

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On 27/01/14 22:21, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
>> I've at least identified two possible memory leaks in blkback, both
>> related to the shutdown path of a VBD:
>>
>> - We don't wait for any pending purge work to finish before cleaning
>> the list of free_pages. The purge work will call put_free_pages and
>> thus we might end up with pages being added to the free_pages list
>> after we have emptied it.
>> - We don't wait for pending requests to end before cleaning persistent
>> grants and the list of free_pages. Again this can add pages to the
>> free_pages lists or persistent grants to the persistent_gnts
>> red-black tree.
>>
>> Also, add some checks in xen_blkif_free to make sure we are cleaning
>> everything.
>>
>> Signed-off-by: Roger Pau Monn? <[email protected]>
>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>> Cc: David Vrabel <[email protected]>
>> Cc: Boris Ostrovsky <[email protected]>
>> Cc: Matt Rushton <[email protected]>
>> Cc: Matt Wilson <[email protected]>
>> Cc: Ian Campbell <[email protected]>
>> ---
>> This should be applied after the patch:
>>
>> xen-blkback: fix memory leak when persistent grants are used
>>
>> >From Matt Rushton & Matt Wilson and backported to stable.
>>
>> I've been able to create and destroy ~4000 guests while doing heavy IO
>> operations with this patch on a 512M Dom0 without problems.
>> ---
>> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
>> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
>> 2 files changed, 28 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index 30ef7b3..19925b7 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>> struct pending_req *pending_req);
>> static void make_response(struct xen_blkif *blkif, u64 id,
>> unsigned short op, int st);
>> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
>>
>> #define foreach_grant_safe(pos, n, rbtree, node) \
>> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
>> @@ -625,6 +626,12 @@ purge_gnt_list:
>> print_stats(blkif);
>> }
>>
>> + /* Drain pending IO */
>> + xen_blk_drain_io(blkif, true);
>> +
>> + /* Drain pending purge work */
>> + flush_work(&blkif->persistent_purge_work);
>> +
>
> I think this means we can eliminate the refcnt usage - at least when
> it comes to xen_blkif_disconnect where if we would initiate the shutdown, and
> there is
>
> 239 atomic_dec(&blkif->refcnt);
> 240 wait_event(blkif->waiting_to_free, atomic_read(&blkif->refcnt) == 0);
> 241 atomic_inc(&blkif->refcnt);
> 242
>
> which is done _after_ the thread is done executing. That check won't
> be needed anymore as the xen_blk_drain_io, flush_work, and free_persistent_gnts
> has pretty much drained every I/O out - so the moment the thread exits
> there should be no need for waiting_to_free. I think.

I've reworked this patch a bit, so we don't drain the in-flight requests
here, and instead moved all the cleanup code to xen_blkif_free. I've
also split the xen_blkif_put race fix into a separate patch.

>
>> /* Free all persistent grant pages */
>> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
>> free_persistent_gnts(blkif, &blkif->persistent_gnts,
>> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
>> return -EIO;
>> }
>>
>> -static void xen_blk_drain_io(struct xen_blkif *blkif)
>> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
>> {
>> atomic_set(&blkif->drain, 1);
>> do {
>> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
>>
>> if (!atomic_read(&blkif->drain))
>> break;
>> - } while (!kthread_should_stop());
>> + } while (!kthread_should_stop() || force);
>> atomic_set(&blkif->drain, 0);
>> }
>>
>> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>> * the proper response on the ring.
>> */
>> if (atomic_dec_and_test(&pending_req->pendcnt)) {
>> - xen_blkbk_unmap(pending_req->blkif,
>> + struct xen_blkif *blkif = pending_req->blkif;
>> +
>> + xen_blkbk_unmap(blkif,
>> pending_req->segments,
>> pending_req->nr_pages);
>> - make_response(pending_req->blkif, pending_req->id,
>> + make_response(blkif, pending_req->id,
>> pending_req->operation, pending_req->status);
>> - xen_blkif_put(pending_req->blkif);
>> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
>> - if (atomic_read(&pending_req->blkif->drain))
>> - complete(&pending_req->blkif->drain_complete);
>> + free_req(blkif, pending_req);
>> + xen_blkif_put(blkif);
>> + if (atomic_read(&blkif->refcnt) <= 2) {
>> + if (atomic_read(&blkif->drain))
>> + complete(&blkif->drain_complete);
>> }
>> - free_req(pending_req->blkif, pending_req);
>
> I keep coming back to this and I am not sure what to think - especially
> in the context of WRITE_BARRIER and disconnecting the vbd.
>
> You moved the 'free_req' to be done before you do atomic_read/dec.
>
> Which means that we do:
>
> list_add(&req->free_list, &blkif->pending_free);
> wake_up(&blkif->pending_free_wq);
>
> atomic_dec
> if atomic_read <= 2 poke thread that is waiting for drain.
>
>
> while in the past we did:
>
> atomic_dec
> if atomic_read <= 2 poke thread that is waiting for drain.
>
> list_add(&req->free_list, &blkif->pending_free);
> wake_up(&blkif->pending_free_wq);
>
> which means that we are giving the 'req' _before_ we decrement
> the refcnts.
>
> Could that mean that __do_block_io_op takes it for a spin - oh
> wait it won't as it is sitting on a WRITE_BARRIER and waiting:
>
> 1226 if (drain)
> 1227 xen_blk_drain_io(pending_req->blkif);
>
> But still that feels 'wrong'?

Mmmm, the wake_up call in free_req in the context of WRITE_BARRIER is
harmless since the thread is waiting on drain_complete as you say, but I
take your point that it's all confusing. Do you think it will feel
better if we gate the call to wake_up in free_req with this condition:

if (was_empty && !atomic_read(&blkif->drain))

Or is this just going to make it even messier?

Maybe just adding a comment in free_req saying that the wake_up call is
going to be ignored in the context of a WRITE_BARRIER, since the thread
is already waiting on drain_complete is enough.

2014-01-28 15:37:24

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On Tue, Jan 28, 2014 at 01:44:37PM +0100, Roger Pau Monn? wrote:
> On 27/01/14 22:21, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
> >> I've at least identified two possible memory leaks in blkback, both
> >> related to the shutdown path of a VBD:
> >>
> >> - We don't wait for any pending purge work to finish before cleaning
> >> the list of free_pages. The purge work will call put_free_pages and
> >> thus we might end up with pages being added to the free_pages list
> >> after we have emptied it.
> >> - We don't wait for pending requests to end before cleaning persistent
> >> grants and the list of free_pages. Again this can add pages to the
> >> free_pages lists or persistent grants to the persistent_gnts
> >> red-black tree.
> >>
> >> Also, add some checks in xen_blkif_free to make sure we are cleaning
> >> everything.
> >>
> >> Signed-off-by: Roger Pau Monn? <[email protected]>
> >> Cc: Konrad Rzeszutek Wilk <[email protected]>
> >> Cc: David Vrabel <[email protected]>
> >> Cc: Boris Ostrovsky <[email protected]>
> >> Cc: Matt Rushton <[email protected]>
> >> Cc: Matt Wilson <[email protected]>
> >> Cc: Ian Campbell <[email protected]>
> >> ---
> >> This should be applied after the patch:
> >>
> >> xen-blkback: fix memory leak when persistent grants are used
> >>
> >> >From Matt Rushton & Matt Wilson and backported to stable.
> >>
> >> I've been able to create and destroy ~4000 guests while doing heavy IO
> >> operations with this patch on a 512M Dom0 without problems.
> >> ---
> >> drivers/block/xen-blkback/blkback.c | 29 +++++++++++++++++++----------
> >> drivers/block/xen-blkback/xenbus.c | 9 +++++++++
> >> 2 files changed, 28 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >> index 30ef7b3..19925b7 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -169,6 +169,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >> struct pending_req *pending_req);
> >> static void make_response(struct xen_blkif *blkif, u64 id,
> >> unsigned short op, int st);
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force);
> >>
> >> #define foreach_grant_safe(pos, n, rbtree, node) \
> >> for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
> >> @@ -625,6 +626,12 @@ purge_gnt_list:
> >> print_stats(blkif);
> >> }
> >>
> >> + /* Drain pending IO */
> >> + xen_blk_drain_io(blkif, true);
> >> +
> >> + /* Drain pending purge work */
> >> + flush_work(&blkif->persistent_purge_work);
> >> +
> >
> > I think this means we can eliminate the refcnt usage - at least when
> > it comes to xen_blkif_disconnect where if we would initiate the shutdown, and
> > there is
> >
> > 239 atomic_dec(&blkif->refcnt);
> > 240 wait_event(blkif->waiting_to_free, atomic_read(&blkif->refcnt) == 0);
> > 241 atomic_inc(&blkif->refcnt);
> > 242
> >
> > which is done _after_ the thread is done executing. That check won't
> > be needed anymore as the xen_blk_drain_io, flush_work, and free_persistent_gnts
> > has pretty much drained every I/O out - so the moment the thread exits
> > there should be no need for waiting_to_free. I think.
>
> I've reworked this patch a bit, so we don't drain the in-flight requests
> here, and instead moved all the cleanup code to xen_blkif_free. I've
> also split the xen_blkif_put race fix into a separate patch.
>
> >
> >> /* Free all persistent grant pages */
> >> if (!RB_EMPTY_ROOT(&blkif->persistent_gnts))
> >> free_persistent_gnts(blkif, &blkif->persistent_gnts,
> >> @@ -930,7 +937,7 @@ static int dispatch_other_io(struct xen_blkif *blkif,
> >> return -EIO;
> >> }
> >>
> >> -static void xen_blk_drain_io(struct xen_blkif *blkif)
> >> +static void xen_blk_drain_io(struct xen_blkif *blkif, bool force)
> >> {
> >> atomic_set(&blkif->drain, 1);
> >> do {
> >> @@ -943,7 +950,7 @@ static void xen_blk_drain_io(struct xen_blkif *blkif)
> >>
> >> if (!atomic_read(&blkif->drain))
> >> break;
> >> - } while (!kthread_should_stop());
> >> + } while (!kthread_should_stop() || force);
> >> atomic_set(&blkif->drain, 0);
> >> }
> >>
> >> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
> >> * the proper response on the ring.
> >> */
> >> if (atomic_dec_and_test(&pending_req->pendcnt)) {
> >> - xen_blkbk_unmap(pending_req->blkif,
> >> + struct xen_blkif *blkif = pending_req->blkif;
> >> +
> >> + xen_blkbk_unmap(blkif,
> >> pending_req->segments,
> >> pending_req->nr_pages);
> >> - make_response(pending_req->blkif, pending_req->id,
> >> + make_response(blkif, pending_req->id,
> >> pending_req->operation, pending_req->status);
> >> - xen_blkif_put(pending_req->blkif);
> >> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
> >> - if (atomic_read(&pending_req->blkif->drain))
> >> - complete(&pending_req->blkif->drain_complete);
> >> + free_req(blkif, pending_req);
> >> + xen_blkif_put(blkif);
> >> + if (atomic_read(&blkif->refcnt) <= 2) {
> >> + if (atomic_read(&blkif->drain))
> >> + complete(&blkif->drain_complete);
> >> }
> >> - free_req(pending_req->blkif, pending_req);
> >
> > I keep coming back to this and I am not sure what to think - especially
> > in the context of WRITE_BARRIER and disconnecting the vbd.
> >
> > You moved the 'free_req' to be done before you do atomic_read/dec.
> >
> > Which means that we do:
> >
> > list_add(&req->free_list, &blkif->pending_free);
> > wake_up(&blkif->pending_free_wq);
> >
> > atomic_dec
> > if atomic_read <= 2 poke thread that is waiting for drain.
> >
> >
> > while in the past we did:
> >
> > atomic_dec
> > if atomic_read <= 2 poke thread that is waiting for drain.
> >
> > list_add(&req->free_list, &blkif->pending_free);
> > wake_up(&blkif->pending_free_wq);
> >
> > which means that we are giving the 'req' _before_ we decrement
> > the refcnts.
> >
> > Could that mean that __do_block_io_op takes it for a spin - oh
> > wait it won't as it is sitting on a WRITE_BARRIER and waiting:
> >
> > 1226 if (drain)
> > 1227 xen_blk_drain_io(pending_req->blkif);
> >
> > But still that feels 'wrong'?
>
> Mmmm, the wake_up call in free_req in the context of WRITE_BARRIER is
> harmless since the thread is waiting on drain_complete as you say, but I
> take your point that it's all confusing. Do you think it will feel
> better if we gate the call to wake_up in free_req with this condition:
>
> if (was_empty && !atomic_read(&blkif->drain))
>
> Or is this just going to make it even messier?

My head spins around when thinking about the refcnt, drain, the two or
three workqueues.

>
> Maybe just adding a comment in free_req saying that the wake_up call is
> going to be ignored in the context of a WRITE_BARRIER, since the thread
> is already waiting on drain_complete is enough.

Perhaps. You do pass in the 'force' bool flag and we could piggyback
on that. Meaning you could do

/* a comment about what we just mentioned */

if (!force) {
// do it the old way
} else {

/* A comment mentioning _why_ we need the code reshuffled */

// do it the new way
}

It would be a bit messy - but:
- We won't have to worry about breaking WRITE_BARRIER as the old
logic would be preserved. So less worry about regressions.
- The bug-fix would be easy to backport as it would inject code for
just the usage you want - that is to drain all I/Os.
- It would make a nice distinction and allows us to refactor
this in future patches.
The cons are that:
- It would add extra path for just the use-case of shutting down
without using the existing one.
- It would be messy

But I think when it comes to fixes like these that are
candidates for backports - messy is OK and if they don't have any
posibility of introducing regressions on existing other behaviors -
then we should stick with that.

Then in the future we can refactor this to use less of these
workqueues, refcnt and atomics we have. It is getting confusing.

Thoughts?

2014-01-28 16:01:59

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH] xen-blkback: fix memory leaks

On 28/01/14 16:37, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 28, 2014 at 01:44:37PM +0100, Roger Pau Monn? wrote:
>> On 27/01/14 22:21, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Jan 27, 2014 at 11:13:41AM +0100, Roger Pau Monne wrote:
>>>> @@ -976,17 +983,19 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>>>> * the proper response on the ring.
>>>> */
>>>> if (atomic_dec_and_test(&pending_req->pendcnt)) {
>>>> - xen_blkbk_unmap(pending_req->blkif,
>>>> + struct xen_blkif *blkif = pending_req->blkif;
>>>> +
>>>> + xen_blkbk_unmap(blkif,
>>>> pending_req->segments,
>>>> pending_req->nr_pages);
>>>> - make_response(pending_req->blkif, pending_req->id,
>>>> + make_response(blkif, pending_req->id,
>>>> pending_req->operation, pending_req->status);
>>>> - xen_blkif_put(pending_req->blkif);
>>>> - if (atomic_read(&pending_req->blkif->refcnt) <= 2) {
>>>> - if (atomic_read(&pending_req->blkif->drain))
>>>> - complete(&pending_req->blkif->drain_complete);
>>>> + free_req(blkif, pending_req);
>>>> + xen_blkif_put(blkif);
>>>> + if (atomic_read(&blkif->refcnt) <= 2) {
>>>> + if (atomic_read(&blkif->drain))
>>>> + complete(&blkif->drain_complete);
>>>> }
>>>> - free_req(pending_req->blkif, pending_req);
>>>
>>> I keep coming back to this and I am not sure what to think - especially
>>> in the context of WRITE_BARRIER and disconnecting the vbd.
>>>
>>> You moved the 'free_req' to be done before you do atomic_read/dec.
>>>
>>> Which means that we do:
>>>
>>> list_add(&req->free_list, &blkif->pending_free);
>>> wake_up(&blkif->pending_free_wq);
>>>
>>> atomic_dec
>>> if atomic_read <= 2 poke thread that is waiting for drain.
>>>
>>>
>>> while in the past we did:
>>>
>>> atomic_dec
>>> if atomic_read <= 2 poke thread that is waiting for drain.
>>>
>>> list_add(&req->free_list, &blkif->pending_free);
>>> wake_up(&blkif->pending_free_wq);
>>>
>>> which means that we are giving the 'req' _before_ we decrement
>>> the refcnts.
>>>
>>> Could that mean that __do_block_io_op takes it for a spin - oh
>>> wait it won't as it is sitting on a WRITE_BARRIER and waiting:
>>>
>>> 1226 if (drain)
>>> 1227 xen_blk_drain_io(pending_req->blkif);
>>>
>>> But still that feels 'wrong'?
>>
>> Mmmm, the wake_up call in free_req in the context of WRITE_BARRIER is
>> harmless since the thread is waiting on drain_complete as you say, but I
>> take your point that it's all confusing. Do you think it will feel
>> better if we gate the call to wake_up in free_req with this condition:
>>
>> if (was_empty && !atomic_read(&blkif->drain))
>>
>> Or is this just going to make it even messier?
>
> My head spins around when thinking about the refcnt, drain, the two or
> three workqueues.
>
>>
>> Maybe just adding a comment in free_req saying that the wake_up call is
>> going to be ignored in the context of a WRITE_BARRIER, since the thread
>> is already waiting on drain_complete is enough.
>
> Perhaps. You do pass in the 'force' bool flag and we could piggyback
> on that. Meaning you could do

In the new version I'm preparing I'm no longer calling drain_io on
xen_blkif_schedule (so there's no "force" flag), instead I've moved the
cleanup code to xen_blkif_free where I think it makes more sense.

Also the force flag was just a local variable to drain_io, I think it
would get even messier if we add yet another variable (force) to the
xen_blkif struct.

>
> /* a comment about what we just mentioned */
>
> if (!force) {
> // do it the old way
> } else {
>
> /* A comment mentioning _why_ we need the code reshuffled */
>
> // do it the new way
> }
>
> It would be a bit messy - but:
> - We won't have to worry about breaking WRITE_BARRIER as the old
> logic would be preserved. So less worry about regressions.
> - The bug-fix would be easy to backport as it would inject code for
> just the usage you want - that is to drain all I/Os.
> - It would make a nice distinction and allows us to refactor
> this in future patches.
> The cons are that:
> - It would add extra path for just the use-case of shutting down
> without using the existing one.
> - It would be messy
>
>
> But I think when it comes to fixes like these that are
> candidates for backports - messy is OK and if they don't have any
> posibility of introducing regressions on existing other behaviors -
> then we should stick with that.
>
>
> Then in the future we can refactor this to use less of these
> workqueues, refcnt and atomics we have. It is getting confusing.
>
> Thoughts?

Let me post the whole series as I have them now, and we can pick it up
again from that.