LinuxLists.cc - [PATCH 0/4] xen/blk: persistent grant rework

2018-08-06 11:35:27

Subject: [PATCH 0/4] xen/blk: persistent grant rework

Persistent grants are used in the Xen's blkfront/blkback drivers to
avoid mapping/unmapping of I/O buffers in the backend for each I/O.

While this speeds up processing quite a bit there are problems related
to persistent grants in some configurations: domains with multiple
block devices making use of persistent grants might suffer from a lack
of grants if each of the block devices experienced a high I/O load at
some time. This is due to the number of persistent grants per device
only to be limited by a rather high maximum value, but never being
released even in case of longer times without any I/O.

This series modifies xen-blkback to unmap any domU page mapped via a
persistent grant after a timeout (default: 60 seconds). The timeout
is set to its default value again when a persistent grant has been
used for an I/O.

xen-blkfront is modified to scan every 10 seconds for persistent grants
not in use by blkback any more and to remove such grants.

The last 2 patches are small cleanups of blkfront and blkback drivers.

Juergen Gross (4):
xen/blkback: don't keep persistent grants too long
xen/blkfront: cleanup stale persistent grants
xen/blkfront: reorder tests in xlblk_init()
xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring

drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++---------
drivers/block/xen-blkback/common.h | 2 +-
drivers/block/xen-blkfront.c | 115 ++++++++++++++++++++++++++++++++----
3 files changed, 153 insertions(+), 41 deletions(-)

--
2.13.7

2018-08-06 11:35:32

by Jürgen Groß

[permalink] [raw]

Subject: [PATCH 1/4] xen/blkback: don't keep persistent grants too long

Persistent grants are allocated until a threshold per ring is being
reached. Those grants won't be freed until the ring is being destroyed
meaning there will be resources kept busy which might no longer be
used.

Instead of freeing only persistent grants until the threshold is
reached add a timestamp and remove all persistent grants not having
been in use for a minute.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
drivers/block/xen-blkback/common.h | 1 +
2 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index b55b245e8052..485e3ecab144 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
"Maximum number of grants to map persistently");

/*
+ * How long a persistent grant is allowed to remain allocated without being in
+ * use. The time is in seconds, 0 means indefinitely long.
+ */
+
+unsigned int xen_blkif_pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+ uint, 0644);
+MODULE_PARM_DESC(persistent_grant_unused_seconds,
+ "Time in seconds an unused persistent grant is allowed to "
+ "remain allocated. Default is 60, 0 means unlimited.");
+
+/*
* Maximum number of rings/queues blkback supports, allow as many queues as there
* are CPUs if user has not specified a value.
*/
@@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
/* Number of free pages to remove on each call to gnttab_free_pages */
#define NUM_BATCH_FREE_PAGES 10

+static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
+{
+ return xen_blkif_pgrant_timeout &&
+ (jiffies - persistent_gnt->last_used >=
+ HZ * xen_blkif_pgrant_timeout);
+}
+
static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
{
unsigned long flags;
@@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
{
if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
pr_alert_ratelimited("freeing a grant already unused\n");
+ persistent_gnt->last_used = jiffies;
set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
atomic_dec(&ring->persistent_gnt_in_use);
@@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
bool scan_used = false, clean_used = false;
struct rb_root *root;

- if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
- (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
- !ring->blkif->vbd.overflow_max_grants)) {
- goto out;
- }
-
if (work_busy(&ring->persistent_purge_work)) {
pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
goto out;
}

- num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
- num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
- num_clean = min(ring->persistent_gnt_c, num_clean);
- if ((num_clean == 0) ||
- (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
- goto out;
+ if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
+ (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+ !ring->blkif->vbd.overflow_max_grants)) {
+ num_clean = 0;
+ } else {
+ num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
+ num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
+ num_clean;
+ num_clean = min(ring->persistent_gnt_c, num_clean);
+ pr_debug("Going to purge at least %u persistent grants\n",
+ num_clean);
+ }

/*
* At this point, we can assure that there will be no calls
@@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
* number of grants.
*/

- total = num_clean;
-
- pr_debug("Going to purge %u persistent grants\n", num_clean);
+ total = 0;

BUG_ON(!list_empty(&ring->persistent_purge_list));
root = &ring->persistent_gnts;
@@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)

if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
continue;
- if (!scan_used &&
+ if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
(test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))
continue;
+ if (scan_used && total >= num_clean)
+ continue;

rb_erase(&persistent_gnt->node, root);
list_add(&persistent_gnt->remove_node,
&ring->persistent_purge_list);
- if (--num_clean == 0)
- goto finished;
+ total++;
}
/*
- * If we get here it means we also need to start cleaning
+ * Check whether we also need to start cleaning
* grants that were used since last purge in order to cope
* with the requested num
*/
- if (!scan_used && !clean_used) {
- pr_debug("Still missing %u purged frames\n", num_clean);
+ if (!scan_used && !clean_used && total < num_clean) {
+ pr_debug("Still missing %u purged frames\n", num_clean - total);
scan_used = true;
goto purge_list;
}
-finished:
- if (!clean_used) {
+
+ if (!clean_used && num_clean) {
pr_debug("Finished scanning for grants to clean, removing used flag\n");
clean_used = true;
goto purge_list;
}

- ring->persistent_gnt_c -= (total - num_clean);
- ring->blkif->vbd.overflow_max_grants = 0;
+ if (total) {
+ ring->persistent_gnt_c -= total;
+ ring->blkif->vbd.overflow_max_grants = 0;

- /* We can defer this work */
- schedule_work(&ring->persistent_purge_work);
- pr_debug("Purged %u/%u\n", (total - num_clean), total);
+ /* We can defer this work */
+ schedule_work(&ring->persistent_purge_work);
+ pr_debug("Purged %u/%u\n", num_clean, total);
+ }

out:
return;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index ecb35fe8ca8d..26710602d463 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -250,6 +250,7 @@ struct persistent_gnt {
struct page *page;
grant_ref_t gnt;
grant_handle_t handle;
+ unsigned long last_used;
DECLARE_BITMAP(flags, PERSISTENT_GNT_FLAGS_SIZE);
struct rb_node node;
struct list_head remove_node;
--
2.13.7

2018-08-06 11:35:46

by Jürgen Groß

[permalink] [raw]

Subject: [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring

pers_gnts_lock isn't being used anywhere. Remove it.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkback/common.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 26710602d463..4b7747159c38 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -279,7 +279,6 @@ struct xen_blkif_ring {
wait_queue_head_t pending_free_wq;

/* Tree to store persistent grants. */
- spinlock_t pers_gnts_lock;
struct rb_root persistent_gnts;
unsigned int persistent_gnt_c;
atomic_t persistent_gnt_in_use;
--
2.13.7

2018-08-06 11:36:30

by Jürgen Groß

[permalink] [raw]

Subject: [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init()

In case we don't want pv block devices we should not test parameters
for sanity and eventually print out error messages. So test precluding
conditions before checking parameters.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 19feb8835fc4..f72d96384326 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2718,6 +2718,15 @@ static int __init xlblk_init(void)
if (!xen_domain())
return -ENODEV;

+ if (!xen_has_pv_disk_devices())
+ return -ENODEV;
+
+ if (register_blkdev(XENVBD_MAJOR, DEV_NAME)) {
+ pr_warn("xen_blk: can't get major %d with name %s\n",
+ XENVBD_MAJOR, DEV_NAME);
+ return -ENODEV;
+ }
+
if (xen_blkif_max_segments < BLKIF_MAX_SEGMENTS_PER_REQUEST)
xen_blkif_max_segments = BLKIF_MAX_SEGMENTS_PER_REQUEST;

@@ -2733,15 +2742,6 @@ static int __init xlblk_init(void)
xen_blkif_max_queues = nr_cpus;
}

- if (!xen_has_pv_disk_devices())
- return -ENODEV;
-
- if (register_blkdev(XENVBD_MAJOR, DEV_NAME)) {
- printk(KERN_WARNING "xen_blk: can't get major %d with name %s\n",
- XENVBD_MAJOR, DEV_NAME);
- return -ENODEV;
- }
-
INIT_DELAYED_WORK(&blkfront_work, blkfront_delay_work);

ret = xenbus_register_frontend(&blkfront_driver);
--
2.13.7

2018-08-06 11:37:00

by Jürgen Groß

[permalink] [raw]

Subject: [PATCH] xen/blkfront: remove unused macros

Remove some macros not used anywhere.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b5cedccb5d7d..94300dbe358b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -251,14 +251,9 @@ static DEFINE_SPINLOCK(minor_lock);
#define GRANTS_PER_INDIRECT_FRAME \
(XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))

-#define PSEGS_PER_INDIRECT_FRAME \
- (GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
-
#define INDIRECT_GREFS(_grants) \
DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)

-#define GREFS(_psegs) ((_psegs) * GRANTS_PER_PSEG)
-
static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
static void blkfront_gather_backend_features(struct blkfront_info *info);
static int negotiate_mq(struct blkfront_info *info);
--
2.13.7

2018-08-06 11:38:13

by Jürgen Groß

[permalink] [raw]

Subject: Re: [PATCH] xen/blkfront: remove unused macros

On 06/08/18 13:34, Juergen Gross wrote:
> Remove some macros not used anywhere.

Meh, ignore that one, please. Doesn't belong to this series.

Juergen

2018-08-06 13:09:19

by Jürgen Groß

[permalink] [raw]

Subject: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

Add a periodic cleanup function to remove old persistent grants which
are no longer in use on the backend side. This avoids starvation in
case there are lots of persistent grants for a device which no longer
is involved in I/O business.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 95 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b5cedccb5d7d..19feb8835fc4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -46,6 +46,7 @@
#include <linux/scatterlist.h>
#include <linux/bitmap.h>
#include <linux/list.h>
+#include <linux/workqueue.h>

#include <xen/xen.h>
#include <xen/xenbus.h>
@@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)

static DEFINE_MUTEX(blkfront_mutex);
static const struct block_device_operations xlvbd_block_fops;
+static struct delayed_work blkfront_work;
+static LIST_HEAD(info_list);
+static bool blkfront_work_active;

/*
* Maximum number of segments in indirect requests, the actual value used by
@@ -216,6 +220,7 @@ struct blkfront_info
/* Save uncomplete reqs and bios for migration. */
struct list_head requests;
struct bio_list bio_list;
+ struct list_head info_list;
};

static unsigned int nr_minors;
@@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
return err;
}

+static void free_info(struct blkfront_info *info)
+{
+ list_del(&info->info_list);
+ kfree(info);
+}
+
/* Common code used when first setting up, and when resuming. */
static int talk_to_blkback(struct xenbus_device *dev,
struct blkfront_info *info)
@@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
destroy_blkring:
blkif_free(info, 0);

- kfree(info);
+ mutex_lock(&blkfront_mutex);
+ free_info(info);
+ mutex_unlock(&blkfront_mutex);
+
dev_set_drvdata(&dev->dev, NULL);

return err;
@@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);

+ mutex_lock(&blkfront_mutex);
+ list_add(&info->info_list, &info_list);
+ mutex_unlock(&blkfront_mutex);
+
return 0;
}

@@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
indirect_segments = 0;
info->max_indirect_segments = indirect_segments;
+
+ if (info->feature_persistent) {
+ mutex_lock(&blkfront_mutex);
+ if (!blkfront_work_active) {
+ blkfront_work_active = true;
+ schedule_delayed_work(&blkfront_work, HZ * 10);
+ }
+ mutex_unlock(&blkfront_mutex);
+ }
}

/*
@@ -2487,7 +2514,9 @@ static int blkfront_remove(struct xenbus_device *xbdev)
mutex_unlock(&info->mutex);

if (!bdev) {
- kfree(info);
+ mutex_lock(&blkfront_mutex);
+ free_info(info);
+ mutex_unlock(&blkfront_mutex);
return 0;
}

@@ -2507,7 +2536,9 @@ static int blkfront_remove(struct xenbus_device *xbdev)
if (info && !bdev->bd_openers) {
xlvbd_release_gendisk(info);
disk->private_data = NULL;
- kfree(info);
+ mutex_lock(&blkfront_mutex);
+ free_info(info);
+ mutex_unlock(&blkfront_mutex);
}

mutex_unlock(&bdev->bd_mutex);
@@ -2590,7 +2621,7 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
dev_info(disk_to_dev(bdev->bd_disk), "releasing disk\n");
xlvbd_release_gendisk(info);
disk->private_data = NULL;
- kfree(info);
+ free_info(info);
}

out:
@@ -2623,6 +2654,62 @@ static struct xenbus_driver blkfront_driver = {
.is_ready = blkfront_is_ready,
};

+static void purge_persistent_grants(struct blkfront_info *info)
+{
+ unsigned int i;
+ unsigned long flags;
+
+ for (i = 0; i < info->nr_rings; i++) {
+ struct blkfront_ring_info *rinfo = &info->rinfo[i];
+ struct grant *gnt_list_entry, *tmp;
+
+ spin_lock_irqsave(&rinfo->ring_lock, flags);
+
+ if (rinfo->persistent_gnts_c == 0) {
+ spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+ continue;
+ }
+
+ list_for_each_entry_safe(gnt_list_entry, tmp, &rinfo->grants,
+ node) {
+ if (gnt_list_entry->gref == GRANT_INVALID_REF ||
+ gnttab_query_foreign_access(gnt_list_entry->gref))
+ continue;
+
+ list_del(&gnt_list_entry->node);
+ gnttab_end_foreign_access(gnt_list_entry->gref, 0, 0UL);
+ rinfo->persistent_gnts_c--;
+ __free_page(gnt_list_entry->page);
+ kfree(gnt_list_entry);
+ }
+
+ spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+ }
+}
+
+static void blkfront_delay_work(struct work_struct *work)
+{
+ struct blkfront_info *info;
+
+ mutex_lock(&blkfront_mutex);
+
+ blkfront_work_active = false;
+
+ list_for_each_entry(info, &info_list, info_list) {
+ if (info->feature_persistent) {
+ blkfront_work_active = true;
+ mutex_lock(&info->mutex);
+ purge_persistent_grants(info);
+ mutex_unlock(&info->mutex);
+ }
+ }
+
+ if (blkfront_work_active)
+ schedule_delayed_work(&blkfront_work, HZ * 10);
+
+ mutex_unlock(&blkfront_mutex);
+}
+
static int __init xlblk_init(void)
{
int ret;
@@ -2655,6 +2742,8 @@ static int __init xlblk_init(void)
return -ENODEV;
}

+ INIT_DELAYED_WORK(&blkfront_work, blkfront_delay_work);
+
ret = xenbus_register_frontend(&blkfront_driver);
if (ret) {
unregister_blkdev(XENVBD_MAJOR, DEV_NAME);
@@ -2668,6 +2757,8 @@ module_init(xlblk_init);

static void __exit xlblk_exit(void)
{
+ cancel_delayed_work_sync(&blkfront_work);
+
xenbus_unregister_driver(&blkfront_driver);
unregister_blkdev(XENVBD_MAJOR, DEV_NAME);
kfree(minors);
--
2.13.7

2018-08-06 16:20:13

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 3/4] xen/blkfront: reorder tests in xlblk_init()

On Mon, Aug 06, 2018 at 01:34:02PM +0200, Juergen Gross wrote:
> In case we don't want pv block devices we should not test parameters
> for sanity and eventually print out error messages. So test precluding
> conditions before checking parameters.
>
> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Roger Pau Monn? <[email protected]>

Thanks, Roger.

2018-08-06 16:21:49

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 4/4] xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring

On Mon, Aug 06, 2018 at 01:34:03PM +0200, Juergen Gross wrote:
> pers_gnts_lock isn't being used anywhere. Remove it.

That was likely a leftover from when the list of persistent grants was
shared between rings.

> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Roger Pau Monn? <[email protected]>

Thanks, Roger.

2018-08-06 17:54:25

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 1/4] xen/blkback: don't keep persistent grants too long

On Mon, Aug 06, 2018 at 01:33:59PM +0200, Juergen Gross wrote:
> Persistent grants are allocated until a threshold per ring is being
> reached. Those grants won't be freed until the ring is being destroyed
> meaning there will be resources kept busy which might no longer be
> used.
>
> Instead of freeing only persistent grants until the threshold is
> reached add a timestamp and remove all persistent grants not having
> been in use for a minute.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
> drivers/block/xen-blkback/common.h | 1 +
> 2 files changed, 50 insertions(+), 28 deletions(-)

You should document this new parameter in
Documentation/ABI/testing/sysfs-driver-xen-blkback.

>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index b55b245e8052..485e3ecab144 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
> "Maximum number of grants to map persistently");
>
> /*
> + * How long a persistent grant is allowed to remain allocated without being in
> + * use. The time is in seconds, 0 means indefinitely long.
> + */
> +
> +unsigned int xen_blkif_pgrant_timeout = 60;
> +module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
> + uint, 0644);
> +MODULE_PARM_DESC(persistent_grant_unused_seconds,
> + "Time in seconds an unused persistent grant is allowed to "
> + "remain allocated. Default is 60, 0 means unlimited.");
> +
> +/*
> * Maximum number of rings/queues blkback supports, allow as many queues as there
> * are CPUs if user has not specified a value.
> */
> @@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
> /* Number of free pages to remove on each call to gnttab_free_pages */
> #define NUM_BATCH_FREE_PAGES 10
>
> +static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
> +{
> + return xen_blkif_pgrant_timeout &&
> + (jiffies - persistent_gnt->last_used >=
> + HZ * xen_blkif_pgrant_timeout);
> +}
> +
> static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
> {
> unsigned long flags;
> @@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
> {
> if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
> pr_alert_ratelimited("freeing a grant already unused\n");
> + persistent_gnt->last_used = jiffies;
> set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
> clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
> atomic_dec(&ring->persistent_gnt_in_use);
> @@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
> bool scan_used = false, clean_used = false;
> struct rb_root *root;
>
> - if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
> - (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
> - !ring->blkif->vbd.overflow_max_grants)) {
> - goto out;
> - }
> -
> if (work_busy(&ring->persistent_purge_work)) {
> pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
> goto out;
> }
>
> - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
> - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
> - num_clean = min(ring->persistent_gnt_c, num_clean);
> - if ((num_clean == 0) ||
> - (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
> - goto out;
> + if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
> + (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
> + !ring->blkif->vbd.overflow_max_grants)) {
> + num_clean = 0;
> + } else {
> + num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
> + num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
> + num_clean;
> + num_clean = min(ring->persistent_gnt_c, num_clean);
> + pr_debug("Going to purge at least %u persistent grants\n",
> + num_clean);
> + }
>
> /*
> * At this point, we can assure that there will be no calls
> @@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
> * number of grants.
> */
>
> - total = num_clean;
> -
> - pr_debug("Going to purge %u persistent grants\n", num_clean);
> + total = 0;
>
> BUG_ON(!list_empty(&ring->persistent_purge_list));
> root = &ring->persistent_gnts;
> @@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>
> if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
> continue;
> - if (!scan_used &&
> + if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
> (test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))

If you store the jiffies of the time when the grant was last used it
seems like we could get rid of the PERSISTENT_GNT_WAS_ACTIVE flag and
instead use the per-grant jiffies and the jiffies from the last scan
in order to decide which grants to remove?

Thanks, Roger.

2018-08-06 18:02:23

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> Add a periodic cleanup function to remove old persistent grants which
> are no longer in use on the backend side. This avoids starvation in
> case there are lots of persistent grants for a device which no longer
> is involved in I/O business.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 95 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index b5cedccb5d7d..19feb8835fc4 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -46,6 +46,7 @@
> #include <linux/scatterlist.h>
> #include <linux/bitmap.h>
> #include <linux/list.h>
> +#include <linux/workqueue.h>
>
> #include <xen/xen.h>
> #include <xen/xenbus.h>
> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>
> static DEFINE_MUTEX(blkfront_mutex);
> static const struct block_device_operations xlvbd_block_fops;
> +static struct delayed_work blkfront_work;
> +static LIST_HEAD(info_list);
> +static bool blkfront_work_active;
>
> /*
> * Maximum number of segments in indirect requests, the actual value used by
> @@ -216,6 +220,7 @@ struct blkfront_info
> /* Save uncomplete reqs and bios for migration. */
> struct list_head requests;
> struct bio_list bio_list;
> + struct list_head info_list;
> };
>
> static unsigned int nr_minors;
> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
> return err;
> }
>
> +static void free_info(struct blkfront_info *info)
> +{
> + list_del(&info->info_list);
> + kfree(info);
> +}
> +
> /* Common code used when first setting up, and when resuming. */
> static int talk_to_blkback(struct xenbus_device *dev,
> struct blkfront_info *info)
> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
> destroy_blkring:
> blkif_free(info, 0);
>
> - kfree(info);
> + mutex_lock(&blkfront_mutex);
> + free_info(info);
> + mutex_unlock(&blkfront_mutex);
> +
> dev_set_drvdata(&dev->dev, NULL);
>
> return err;
> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> dev_set_drvdata(&dev->dev, info);
>
> + mutex_lock(&blkfront_mutex);
> + list_add(&info->info_list, &info_list);
> + mutex_unlock(&blkfront_mutex);
> +
> return 0;
> }
>
> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
> if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
> indirect_segments = 0;
> info->max_indirect_segments = indirect_segments;
> +
> + if (info->feature_persistent) {
> + mutex_lock(&blkfront_mutex);
> + if (!blkfront_work_active) {
> + blkfront_work_active = true;
> + schedule_delayed_work(&blkfront_work, HZ * 10);

Does it make sense to provide a module parameter to rune the schedule
of the cleanup routine?

> + }
> + mutex_unlock(&blkfront_mutex);

Is it really necessary to have the blkfront_work_active boolean? What
happens if you queue the same delayed work more than once?

Thanks, Roger.

2018-08-07 07:58:13

by Jürgen Groß

[permalink] [raw]

Subject: Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

On 06/08/18 18:16, Roger Pau Monné wrote:
> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
>> Add a periodic cleanup function to remove old persistent grants which
>> are no longer in use on the backend side. This avoids starvation in
>> case there are lots of persistent grants for a device which no longer
>> is involved in I/O business.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 95 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index b5cedccb5d7d..19feb8835fc4 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -46,6 +46,7 @@
>> #include <linux/scatterlist.h>
>> #include <linux/bitmap.h>
>> #include <linux/list.h>
>> +#include <linux/workqueue.h>
>>
>> #include <xen/xen.h>
>> #include <xen/xenbus.h>
>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>>
>> static DEFINE_MUTEX(blkfront_mutex);
>> static const struct block_device_operations xlvbd_block_fops;
>> +static struct delayed_work blkfront_work;
>> +static LIST_HEAD(info_list);
>> +static bool blkfront_work_active;
>>
>> /*
>> * Maximum number of segments in indirect requests, the actual value used by
>> @@ -216,6 +220,7 @@ struct blkfront_info
>> /* Save uncomplete reqs and bios for migration. */
>> struct list_head requests;
>> struct bio_list bio_list;
>> + struct list_head info_list;
>> };
>>
>> static unsigned int nr_minors;
>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
>> return err;
>> }
>>
>> +static void free_info(struct blkfront_info *info)
>> +{
>> + list_del(&info->info_list);
>> + kfree(info);
>> +}
>> +
>> /* Common code used when first setting up, and when resuming. */
>> static int talk_to_blkback(struct xenbus_device *dev,
>> struct blkfront_info *info)
>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
>> destroy_blkring:
>> blkif_free(info, 0);
>>
>> - kfree(info);
>> + mutex_lock(&blkfront_mutex);
>> + free_info(info);
>> + mutex_unlock(&blkfront_mutex);
>> +
>> dev_set_drvdata(&dev->dev, NULL);
>>
>> return err;
>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
>> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>> dev_set_drvdata(&dev->dev, info);
>>
>> + mutex_lock(&blkfront_mutex);
>> + list_add(&info->info_list, &info_list);
>> + mutex_unlock(&blkfront_mutex);
>> +
>> return 0;
>> }
>>
>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>> if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
>> indirect_segments = 0;
>> info->max_indirect_segments = indirect_segments;
>> +
>> + if (info->feature_persistent) {
>> + mutex_lock(&blkfront_mutex);
>> + if (!blkfront_work_active) {
>> + blkfront_work_active = true;
>> + schedule_delayed_work(&blkfront_work, HZ * 10);
>
> Does it make sense to provide a module parameter to rune the schedule
> of the cleanup routine?

I don't think this is something anyone would like to tune.

In case you think it should be tunable I can add a parameter, of course.

>
>> + }
>> + mutex_unlock(&blkfront_mutex);
>
> Is it really necessary to have the blkfront_work_active boolean? What
> happens if you queue the same delayed work more than once?

In case there is already work queued later calls of
schedule_delayed_work() will be ignored.

So yes, I can drop the global boolean (I still need a local flag in
blkfront_delay_work() for controlling the need to call
schedule_delayed_work() again).

Juergen

2018-08-07 08:35:43

by Jürgen Groß

[permalink] [raw]

Subject: Re: [PATCH 1/4] xen/blkback: don't keep persistent grants too long

On 06/08/18 17:58, Roger Pau Monné wrote:
> On Mon, Aug 06, 2018 at 01:33:59PM +0200, Juergen Gross wrote:
>> Persistent grants are allocated until a threshold per ring is being
>> reached. Those grants won't be freed until the ring is being destroyed
>> meaning there will be resources kept busy which might no longer be
>> used.
>>
>> Instead of freeing only persistent grants until the threshold is
>> reached add a timestamp and remove all persistent grants not having
>> been in use for a minute.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> drivers/block/xen-blkback/blkback.c | 77 +++++++++++++++++++++++--------------
>> drivers/block/xen-blkback/common.h | 1 +
>> 2 files changed, 50 insertions(+), 28 deletions(-)
>
> You should document this new parameter in
> Documentation/ABI/testing/sysfs-driver-xen-blkback.

Yes.

>
>>
>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
>> index b55b245e8052..485e3ecab144 100644
>> --- a/drivers/block/xen-blkback/blkback.c
>> +++ b/drivers/block/xen-blkback/blkback.c
>> @@ -84,6 +84,18 @@ MODULE_PARM_DESC(max_persistent_grants,
>> "Maximum number of grants to map persistently");
>>
>> /*
>> + * How long a persistent grant is allowed to remain allocated without being in
>> + * use. The time is in seconds, 0 means indefinitely long.
>> + */
>> +
>> +unsigned int xen_blkif_pgrant_timeout = 60;
>> +module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
>> + uint, 0644);
>> +MODULE_PARM_DESC(persistent_grant_unused_seconds,
>> + "Time in seconds an unused persistent grant is allowed to "
>> + "remain allocated. Default is 60, 0 means unlimited.");
>> +
>> +/*
>> * Maximum number of rings/queues blkback supports, allow as many queues as there
>> * are CPUs if user has not specified a value.
>> */
>> @@ -123,6 +135,13 @@ module_param(log_stats, int, 0644);
>> /* Number of free pages to remove on each call to gnttab_free_pages */
>> #define NUM_BATCH_FREE_PAGES 10
>>
>> +static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
>> +{
>> + return xen_blkif_pgrant_timeout &&
>> + (jiffies - persistent_gnt->last_used >=
>> + HZ * xen_blkif_pgrant_timeout);
>> +}
>> +
>> static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
>> {
>> unsigned long flags;
>> @@ -278,6 +297,7 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
>> {
>> if(!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>> pr_alert_ratelimited("freeing a grant already unused\n");
>> + persistent_gnt->last_used = jiffies;
>> set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
>> clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
>> atomic_dec(&ring->persistent_gnt_in_use);
>> @@ -374,23 +394,23 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>> bool scan_used = false, clean_used = false;
>> struct rb_root *root;
>>
>> - if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
>> - (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
>> - !ring->blkif->vbd.overflow_max_grants)) {
>> - goto out;
>> - }
>> -
>> if (work_busy(&ring->persistent_purge_work)) {
>> pr_alert_ratelimited("Scheduled work from previous purge is still busy, cannot purge list\n");
>> goto out;
>> }
>>
>> - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
>> - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + num_clean;
>> - num_clean = min(ring->persistent_gnt_c, num_clean);
>> - if ((num_clean == 0) ||
>> - (num_clean > (ring->persistent_gnt_c - atomic_read(&ring->persistent_gnt_in_use))))
>> - goto out;
>> + if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
>> + (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
>> + !ring->blkif->vbd.overflow_max_grants)) {
>> + num_clean = 0;
>> + } else {
>> + num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
>> + num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
>> + num_clean;
>> + num_clean = min(ring->persistent_gnt_c, num_clean);
>> + pr_debug("Going to purge at least %u persistent grants\n",
>> + num_clean);
>> + }
>>
>> /*
>> * At this point, we can assure that there will be no calls
>> @@ -401,9 +421,7 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>> * number of grants.
>> */
>>
>> - total = num_clean;
>> -
>> - pr_debug("Going to purge %u persistent grants\n", num_clean);
>> + total = 0;
>>
>> BUG_ON(!list_empty(&ring->persistent_purge_list));
>> root = &ring->persistent_gnts;
>> @@ -419,39 +437,42 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring)
>>
>> if (test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
>> continue;
>> - if (!scan_used &&
>> + if (!scan_used && !persistent_gnt_timeout(persistent_gnt) &&
>> (test_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags)))
>
> If you store the jiffies of the time when the grant was last used it
> seems like we could get rid of the PERSISTENT_GNT_WAS_ACTIVE flag and
> instead use the per-grant jiffies and the jiffies from the last scan
> in order to decide which grants to remove?

True. This might make the control flow a little bit easier to
understand.

Juergen

2018-08-07 14:21:58

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
> On 06/08/18 18:16, Roger Pau Monn? wrote:
> > On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> >> Add a periodic cleanup function to remove old persistent grants which
> >> are no longer in use on the backend side. This avoids starvation in
> >> case there are lots of persistent grants for a device which no longer
> >> is involved in I/O business.
> >>
> >> Signed-off-by: Juergen Gross <[email protected]>
> >> ---
> >> drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
> >> 1 file changed, 95 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >> index b5cedccb5d7d..19feb8835fc4 100644
> >> --- a/drivers/block/xen-blkfront.c
> >> +++ b/drivers/block/xen-blkfront.c
> >> @@ -46,6 +46,7 @@
> >> #include <linux/scatterlist.h>
> >> #include <linux/bitmap.h>
> >> #include <linux/list.h>
> >> +#include <linux/workqueue.h>
> >>
> >> #include <xen/xen.h>
> >> #include <xen/xenbus.h>
> >> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
> >>
> >> static DEFINE_MUTEX(blkfront_mutex);
> >> static const struct block_device_operations xlvbd_block_fops;
> >> +static struct delayed_work blkfront_work;
> >> +static LIST_HEAD(info_list);
> >> +static bool blkfront_work_active;
> >>
> >> /*
> >> * Maximum number of segments in indirect requests, the actual value used by
> >> @@ -216,6 +220,7 @@ struct blkfront_info
> >> /* Save uncomplete reqs and bios for migration. */
> >> struct list_head requests;
> >> struct bio_list bio_list;
> >> + struct list_head info_list;
> >> };
> >>
> >> static unsigned int nr_minors;
> >> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
> >> return err;
> >> }
> >>
> >> +static void free_info(struct blkfront_info *info)
> >> +{
> >> + list_del(&info->info_list);
> >> + kfree(info);
> >> +}
> >> +
> >> /* Common code used when first setting up, and when resuming. */
> >> static int talk_to_blkback(struct xenbus_device *dev,
> >> struct blkfront_info *info)
> >> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >> destroy_blkring:
> >> blkif_free(info, 0);
> >>
> >> - kfree(info);
> >> + mutex_lock(&blkfront_mutex);
> >> + free_info(info);
> >> + mutex_unlock(&blkfront_mutex);
> >> +
> >> dev_set_drvdata(&dev->dev, NULL);
> >>
> >> return err;
> >> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
> >> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> >> dev_set_drvdata(&dev->dev, info);
> >>
> >> + mutex_lock(&blkfront_mutex);
> >> + list_add(&info->info_list, &info_list);
> >> + mutex_unlock(&blkfront_mutex);
> >> +
> >> return 0;
> >> }
> >>
> >> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
> >> if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
> >> indirect_segments = 0;
> >> info->max_indirect_segments = indirect_segments;
> >> +
> >> + if (info->feature_persistent) {
> >> + mutex_lock(&blkfront_mutex);
> >> + if (!blkfront_work_active) {
> >> + blkfront_work_active = true;
> >> + schedule_delayed_work(&blkfront_work, HZ * 10);
> >
> > Does it make sense to provide a module parameter to rune the schedule
> > of the cleanup routine?
>
> I don't think this is something anyone would like to tune.
>
> In case you think it should be tunable I can add a parameter, of course.

We can always add it later if required. I'm fine as-is now.

> >
> >> + }
> >> + mutex_unlock(&blkfront_mutex);
> >
> > Is it really necessary to have the blkfront_work_active boolean? What
> > happens if you queue the same delayed work more than once?
>
> In case there is already work queued later calls of
> schedule_delayed_work() will be ignored.
>
> So yes, I can drop the global boolean (I still need a local flag in
> blkfront_delay_work() for controlling the need to call
> schedule_delayed_work() again).

Can't you just call schedule_delayed_work if info->feature_persistent
is set, even if that means calling it multiple times if multiple
blkfront instances are using persistent grants?

Thanks, Roger.

2018-08-07 16:45:15

by Jürgen Groß

[permalink] [raw]

Subject: Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

On 07/08/18 16:14, Roger Pau Monné wrote:
> On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
>> On 06/08/18 18:16, Roger Pau Monné wrote:
>>> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
>>>> Add a periodic cleanup function to remove old persistent grants which
>>>> are no longer in use on the backend side. This avoids starvation in
>>>> case there are lots of persistent grants for a device which no longer
>>>> is involved in I/O business.
>>>>
>>>> Signed-off-by: Juergen Gross <[email protected]>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 95 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index b5cedccb5d7d..19feb8835fc4 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -46,6 +46,7 @@
>>>> #include <linux/scatterlist.h>
>>>> #include <linux/bitmap.h>
>>>> #include <linux/list.h>
>>>> +#include <linux/workqueue.h>
>>>>
>>>> #include <xen/xen.h>
>>>> #include <xen/xenbus.h>
>>>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
>>>>
>>>> static DEFINE_MUTEX(blkfront_mutex);
>>>> static const struct block_device_operations xlvbd_block_fops;
>>>> +static struct delayed_work blkfront_work;
>>>> +static LIST_HEAD(info_list);
>>>> +static bool blkfront_work_active;
>>>>
>>>> /*
>>>> * Maximum number of segments in indirect requests, the actual value used by
>>>> @@ -216,6 +220,7 @@ struct blkfront_info
>>>> /* Save uncomplete reqs and bios for migration. */
>>>> struct list_head requests;
>>>> struct bio_list bio_list;
>>>> + struct list_head info_list;
>>>> };
>>>>
>>>> static unsigned int nr_minors;
>>>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
>>>> return err;
>>>> }
>>>>
>>>> +static void free_info(struct blkfront_info *info)
>>>> +{
>>>> + list_del(&info->info_list);
>>>> + kfree(info);
>>>> +}
>>>> +
>>>> /* Common code used when first setting up, and when resuming. */
>>>> static int talk_to_blkback(struct xenbus_device *dev,
>>>> struct blkfront_info *info)
>>>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
>>>> destroy_blkring:
>>>> blkif_free(info, 0);
>>>>
>>>> - kfree(info);
>>>> + mutex_lock(&blkfront_mutex);
>>>> + free_info(info);
>>>> + mutex_unlock(&blkfront_mutex);
>>>> +
>>>> dev_set_drvdata(&dev->dev, NULL);
>>>>
>>>> return err;
>>>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
>>>> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
>>>> dev_set_drvdata(&dev->dev, info);
>>>>
>>>> + mutex_lock(&blkfront_mutex);
>>>> + list_add(&info->info_list, &info_list);
>>>> + mutex_unlock(&blkfront_mutex);
>>>> +
>>>> return 0;
>>>> }
>>>>
>>>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
>>>> if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
>>>> indirect_segments = 0;
>>>> info->max_indirect_segments = indirect_segments;
>>>> +
>>>> + if (info->feature_persistent) {
>>>> + mutex_lock(&blkfront_mutex);
>>>> + if (!blkfront_work_active) {
>>>> + blkfront_work_active = true;
>>>> + schedule_delayed_work(&blkfront_work, HZ * 10);
>>>
>>> Does it make sense to provide a module parameter to rune the schedule
>>> of the cleanup routine?
>>
>> I don't think this is something anyone would like to tune.
>>
>> In case you think it should be tunable I can add a parameter, of course.
>
> We can always add it later if required. I'm fine as-is now.
>
>>>
>>>> + }
>>>> + mutex_unlock(&blkfront_mutex);
>>>
>>> Is it really necessary to have the blkfront_work_active boolean? What
>>> happens if you queue the same delayed work more than once?
>>
>> In case there is already work queued later calls of
>> schedule_delayed_work() will be ignored.
>>
>> So yes, I can drop the global boolean (I still need a local flag in
>> blkfront_delay_work() for controlling the need to call
>> schedule_delayed_work() again).
>
> Can't you just call schedule_delayed_work if info->feature_persistent
> is set, even if that means calling it multiple times if multiple
> blkfront instances are using persistent grants?

I don't like that. With mq we have a high chance for multiple instances
to use persistent grants and a local bool is much cheaper than unneeded
calls of schedule_delayed_work().

Juergen

2018-08-08 08:29:51

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [PATCH 2/4] xen/blkfront: cleanup stale persistent grants

On Tue, Aug 07, 2018 at 05:56:38PM +0200, Juergen Gross wrote:
> On 07/08/18 16:14, Roger Pau Monn? wrote:
> > On Tue, Aug 07, 2018 at 08:31:31AM +0200, Juergen Gross wrote:
> >> On 06/08/18 18:16, Roger Pau Monn? wrote:
> >>> On Mon, Aug 06, 2018 at 01:34:01PM +0200, Juergen Gross wrote:
> >>>> Add a periodic cleanup function to remove old persistent grants which
> >>>> are no longer in use on the backend side. This avoids starvation in
> >>>> case there are lots of persistent grants for a device which no longer
> >>>> is involved in I/O business.
> >>>>
> >>>> Signed-off-by: Juergen Gross <[email protected]>
> >>>> ---
> >>>> drivers/block/xen-blkfront.c | 99 ++++++++++++++++++++++++++++++++++++++++++--
> >>>> 1 file changed, 95 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >>>> index b5cedccb5d7d..19feb8835fc4 100644
> >>>> --- a/drivers/block/xen-blkfront.c
> >>>> +++ b/drivers/block/xen-blkfront.c
> >>>> @@ -46,6 +46,7 @@
> >>>> #include <linux/scatterlist.h>
> >>>> #include <linux/bitmap.h>
> >>>> #include <linux/list.h>
> >>>> +#include <linux/workqueue.h>
> >>>>
> >>>> #include <xen/xen.h>
> >>>> #include <xen/xenbus.h>
> >>>> @@ -121,6 +122,9 @@ static inline struct blkif_req *blkif_req(struct request *rq)
> >>>>
> >>>> static DEFINE_MUTEX(blkfront_mutex);
> >>>> static const struct block_device_operations xlvbd_block_fops;
> >>>> +static struct delayed_work blkfront_work;
> >>>> +static LIST_HEAD(info_list);
> >>>> +static bool blkfront_work_active;
> >>>>
> >>>> /*
> >>>> * Maximum number of segments in indirect requests, the actual value used by
> >>>> @@ -216,6 +220,7 @@ struct blkfront_info
> >>>> /* Save uncomplete reqs and bios for migration. */
> >>>> struct list_head requests;
> >>>> struct bio_list bio_list;
> >>>> + struct list_head info_list;
> >>>> };
> >>>>
> >>>> static unsigned int nr_minors;
> >>>> @@ -1764,6 +1769,12 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
> >>>> return err;
> >>>> }
> >>>>
> >>>> +static void free_info(struct blkfront_info *info)
> >>>> +{
> >>>> + list_del(&info->info_list);
> >>>> + kfree(info);
> >>>> +}
> >>>> +
> >>>> /* Common code used when first setting up, and when resuming. */
> >>>> static int talk_to_blkback(struct xenbus_device *dev,
> >>>> struct blkfront_info *info)
> >>>> @@ -1885,7 +1896,10 @@ static int talk_to_blkback(struct xenbus_device *dev,
> >>>> destroy_blkring:
> >>>> blkif_free(info, 0);
> >>>>
> >>>> - kfree(info);
> >>>> + mutex_lock(&blkfront_mutex);
> >>>> + free_info(info);
> >>>> + mutex_unlock(&blkfront_mutex);
> >>>> +
> >>>> dev_set_drvdata(&dev->dev, NULL);
> >>>>
> >>>> return err;
> >>>> @@ -1996,6 +2010,10 @@ static int blkfront_probe(struct xenbus_device *dev,
> >>>> info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
> >>>> dev_set_drvdata(&dev->dev, info);
> >>>>
> >>>> + mutex_lock(&blkfront_mutex);
> >>>> + list_add(&info->info_list, &info_list);
> >>>> + mutex_unlock(&blkfront_mutex);
> >>>> +
> >>>> return 0;
> >>>> }
> >>>>
> >>>> @@ -2306,6 +2324,15 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
> >>>> if (indirect_segments <= BLKIF_MAX_SEGMENTS_PER_REQUEST)
> >>>> indirect_segments = 0;
> >>>> info->max_indirect_segments = indirect_segments;
> >>>> +
> >>>> + if (info->feature_persistent) {
> >>>> + mutex_lock(&blkfront_mutex);
> >>>> + if (!blkfront_work_active) {
> >>>> + blkfront_work_active = true;
> >>>> + schedule_delayed_work(&blkfront_work, HZ * 10);
> >>>
> >>> Does it make sense to provide a module parameter to rune the schedule
> >>> of the cleanup routine?
> >>
> >> I don't think this is something anyone would like to tune.
> >>
> >> In case you think it should be tunable I can add a parameter, of course.
> >
> > We can always add it later if required. I'm fine as-is now.
> >
> >>>
> >>>> + }
> >>>> + mutex_unlock(&blkfront_mutex);
> >>>
> >>> Is it really necessary to have the blkfront_work_active boolean? What
> >>> happens if you queue the same delayed work more than once?
> >>
> >> In case there is already work queued later calls of
> >> schedule_delayed_work() will be ignored.
> >>
> >> So yes, I can drop the global boolean (I still need a local flag in
> >> blkfront_delay_work() for controlling the need to call
> >> schedule_delayed_work() again).
> >
> > Can't you just call schedule_delayed_work if info->feature_persistent
> > is set, even if that means calling it multiple times if multiple
> > blkfront instances are using persistent grants?
>
> I don't like that. With mq we have a high chance for multiple instances
> to use persistent grants and a local bool is much cheaper than unneeded
> calls of schedule_delayed_work().

OK, I'm convinced with the local bool.

Thanks, Roger.